[Remote] AI Engineer
Note: The job is a remote job and is open to candidates in USA. In Tandem is a company focused on building technology to help families manage their routines and navigate transitions. They are seeking an AI Engineer to maintain and optimize their AI infrastructure, run self-hosted inference stacks, and develop user-facing features that assist families in coordinating their daily activities.ResponsibilitiesRun and optimize our self-hosted inference stackRun the inference serving layer on our own GPU hardware: choose and tune the serving stack (vLLM, SGLang, TensorRT-LLM) for high throughput and low latencyOptimize aggressively: tensor parallelism, quantization (FP8, AWQ, GPTQ), KV-cache and prefix caching, continuous batching, speculative decoding, concurrency tuningServe multiple models and features off shared hardware: multi-LoRA, routing, and request scheduling that balances internal workloads against latency-sensitive product trafficMake our AI workloads efficient: improve latency, throughput, and GPU utilization so we get the most out of what we runBuild the visibility: instrument performance and usage across our AI surfaces so there's clear data on how everything is runningSurface the technical tradeoffs (performance, latency, efficiency) so the people making the calls have what they need to make themShip the in-app agent layer that helps families coordinate: proactive nudges, smart suggestions, agents that summarize, draft, schedule, and act for busy parentsBuild the substrate underneath: tools, memory, orchestration, guardrails, and evaluation harnesses, integrated cleanly with production APIs alongside our architecture teamWork in nimble pairs with feature owners, standing up whatever's needed to test an idea, including a vibe-coded UI when that's the fastest path to a real customer. Ship rough, learn fast, harden what worksSkills5+ years shipping production software, including meaningful applied AI or ML workDemonstrated experience running and optimizing self-hosted LLMs on dedicated multi-GPU hardware: a serving stack (vLLM, SGLang, or TensorRT-LLM) and the optimization that comes with it (tensor parallelism, quantization, batching, KV cache)A track record of optimizing inference performance and efficiency (latency, throughput, GPU utilization)Strong Python and engineering fundamentals, with the full-stack range to stand up a quick UI, and the genuine desire to work app-layer features and not only infraHands-on with agent frameworks (Claude Agent SDK, LangGraph, or similar), LLM APIs, embeddings, and RAGComfortable with AWS and the devops this role owns: Docker, CI/CD, monitoring, and observabilityExperience building internal tooling or platforms others depend on. Bonus for Slack apps, MCP, or agent orchestration at team scaleBenefitsMedical: In Tandem pays 100% of the premium for employees AND 99% for all additional family members401k: Up to a 4% match with immediate vestingPaid leave for all new parentsLearning & Development stipend for employeesPaid Time Off: 11 Holidays + Winter Break (3 Days) + Volunteer Time Off (1 Day) + Floating Holiday (1 Day)Personal Time Off: 15 days for 0-1 years of employment, 20 days 1-3 years of employmentSupportive and flexible working environment – work from anywhere!Company OverviewIn Tandem provides software tools for family organization, communication, and custody management through a technology platform. It was founded in undefined, and is headquartered in Minneapolis, Minnesota, USA, with a workforce of 51-200 employees. Its website is https://www.intandemfamilies.com.