[Remote] Senior AI Software Engineer, Agent Systems
Note: The job is a remote job and is open to candidates in USA. Scale Army Careers is seeking a Senior AI Software Engineer specializing in Agent Systems. The role involves designing self-running agent loops and multi-agent swarms, ensuring systems operate autonomously while maintaining verification and oversight. The engineer will be responsible for building and shipping robust agent platforms that perform real work in production environments.ResponsibilitiesDesign self-running loops. Define the trigger, scope, action, budget, stop condition, and reporting so an agent runs unattended, stays inside cost and iteration limits, and knows when it is done versus when to escalateBuild multi-agent swarms. Orchestrator plus specialized agents with clear file and task ownership, shared state or a shared mailbox, quality gates between stages, and handoffs that do not step on each otherMake verification first-class. Build the part of the system that can say no: the checks, evals, and reviewer agents that catch confident mistakes before they merge. A loop is only as trustworthy as its ability to check its own workOwn agent state and memory. Persistent on-disk state and per-turn context assembly so long-running tasks survive restarts and the system does not forget what the repo already knowsShip the platform around the agents. APIs, services, queues, and integrations in TypeScript and Node, deployed to AWS, with real tests, tracing, and observability for long multi-iteration runsKeep humans in the loop where it counts. Plan approval and pull request review, and active management of comprehension debt so the team understands what the swarm ships, not just that it shippedSkillsStrong engineering fundamentals. 5+ years writing production software that other engineers depend on. (Adjustable; we care more about what you have shipped than the number.)Hands-on loop engineering. You have designed agent loops with explicit stop conditions, budgets, retries, and self-verification. You can explain the difference between a task on repeat and a real loop, and you know why the verifier matters as much as the makerMulti-agent or swarm experience. You have built or operated systems where multiple agents coordinate: orchestration, handoffs, shared state, ownership or locking, and quality gatesFluency with modern agent tooling. Claude Code or Codex style agents, sub-agents, persistent memory and skills files, tool and function calling, MCP, and reason-act-observe loop patternsSolid TypeScript and Node. Comfort with a service framework (NestJS or similar) and a typed data layer (Prisma or similar)Cloud and delivery. AWS (ECS or Fargate or similar), Docker, and CI/CD. You can take something from repo to production yourselfA verification mindset. You treat 'done' as a claim to be proven, and you build the checks that prove itRunning 10+ parallel agents and managing token and cost budgets at scaleDistributed systems, queues, and event-driven designReact for agent-facing interfacesPrior work on developer tooling, orchestration frameworks, or internal agent platformsFamiliarity with where loop engineering is heading next, including continual learning systemsSOC 2 or ISO 27001 awareness for handling client dataBenefitsREMOTEThis role is open to candidates based in LATAM, Africa, and Eastern Europe. Please note that as this role supports U.S.-based clients, candidates must be available to work during U.S. business hours aligned with the client’s time zone.Company OverviewMost job boards make candidates do all the work—decoding vague posts, chasing unclear compensations, and applying into the void. It was founded in 2021, and is headquartered in New York, NY, US, with a workforce of 11-50 employees. Its website is https://careers.scalearmy.com.