[Remote] Staff AI Engineer | US | Remote
Note: The job is a remote job and is open to candidates in USA. Grafana Labs, the company behind the open observability cloud, is looking for a Staff AI Engineer to own the AI agent infrastructure and automation platform that powers their GTM teams. The role involves building multi-agent architectures, LLM integrations, and backend services, while also defining the technical direction for automation and operational efficiency.ResponsibilitiesOwn end-to-end development of multi-agent AI systems, from architecture and implementation through testing, deployment, and ongoing operationBuild modular, composable agentic systems using orchestration frameworks (LangChain, CrewAI, Anthropic MCP, or similar) that operate 24/7 across teamsDevelop reusable agentic skills that agents invoke across interfaces (Slack, dashboards, internal apps, CLIs)Implement observability and feedback loops including logging, performance metrics, prompt iteration, model evaluation, and cost managementEstablish governance and compliance standards for AI workflows including access controls, audit trails, PII handling, and human-in-the-loop escalation pathsBuild MCP servers, APIs, CLIs, and microservices connecting AI models to business systems (BigQuery, Slack, CRMs, email, calendars, analytics tools)Architect data flows for retrieval-augmented generation (RAG), connecting LLMs to internal knowledge bases, customer data, and real-time business contextBuild serverless or containerized services (Google Cloud Platform Cloud Functions, Cloud Run) that scale with usage and integrate with Grafana's cloud infrastructurePartner with RevOps, Demand Generation, Regional Marketing, and SDR teams to scope high-impact automation problems, identify bottlenecks, and build solutions with measurable business outcomesDesign and deploy workflows using orchestration tools (n8n, Workato, or custom platforms) with CI/CD, testing, and production reliability standardsBuild systems designed for self-service with documentation, playbooks, and enablement materials that let partner teams operate independentlySkills8+ years of software engineering experience with depth in backend development, systems integration, or data/analytics engineering2+ years hands-on experience applying LLMs/AI to production workflows, not just prototypesStrong proficiency in Python and JavaScript/Node.js with Git-based workflows, code review practices, and testing disciplineHands-on experience with LLM frameworks and patterns including prompt engineering, RAG, function calling/tool use, structured output parsing, and evaluationExperience building and operating multi-agent systems at scale including agent decomposition, orchestration patterns (sequential chains, router/dispatcher, parallel fan-out), state management, and production monitoringYou diagnose business problems before writing code. You think in workflows and outcomes, not just functionsDeep familiarity with Google Cloud Platform, BigQuery, and serverless/containerized services (Cloud Functions, Cloud Run)Understanding of LLM failure modes and production mitigations including confidence thresholds, fallback logic, human escalation, and cost/latency managementProven ability to identify high-leverage problems, push back on low-impact requests, and deliver end-to-end with minimal directionFluent with AI-assisted development tools (GitHub Copilot, Cursor, Claude Code). You use AI to build AI systemsClear technical communicator-you can explain complex systems in simple terms to both engineers and business stakeholdersExperience with frontend frameworks & tooling (React, Slack Block Kit, dashboard components) to build user-facing interfaces for AI toolsFamiliarity with GTM platforms like Salesforce, HubSpot, Outreach, Gainsight, or similar CRM/sales engagement toolsExperience with vector databases or retrieval pipelines (Pinecone, Weaviate, ChromaDB, pgvector, or similar)Prior work automating sales, customer success, or marketing workflows in a B2B SaaS environmentExperience with workflow automation platforms like n8n, Prefect, Clay, PhantomBuster, Apify, Dust, or similar toolsFamiliarity with Model Context Protocol (MCP) or similar standards for connecting AI systems to data sources and toolsExposure to observability tools for AI systems (LangSmith, Weights & Biases, custom logging/evaluation frameworks)Experience working in Revenue Operations, GTM Analytics, or Sales Operations environmentsPrevious experience in open source or developer-focused SaaS companies-Grafana is built on OSS and we value engineers who share that DNABenefitsBenefits include equity, bonus (if applicable) and other benefits listed here.All of our roles include Restricted Stock Units (RSUs), giving every team member ownership in Grafana Labs' success.100% Remote, Global Culture - As a remote-only company, we bring together talent from around the world, united by a culture of collaboration and shared purpose.Career Growth Pathways - Defined opportunities to grow and develop your career.In-Person onboarding - We want you to thrive from day 1 with your fellow new 'Grafanistas' to learn all about what we do and how we do it.We operate a global annual leave policy of 30 days per annum. 3 days of your annual leave entitlement are reserved for Grafana Shutdown Days to allow the team to really disconnect. *We will comply with local legislation where applicable.Company OverviewDice is a job-searching platform for technology professionals. It is a sub-organization of DHI Group. It was founded in 1990, and is headquartered in Santa Clara, California, USA, with a workforce of 201-500 employees. Its website is http://www.dice.com.