Senior Engineering Manager - Metrics Platform
We’re not just building better tech. We’re rewriting how data moves and what the world can do with it. With Confluent, data doesn’t sit still. Our platform puts information in motion, streaming in near real-time so companies can react faster, build smarter, and deliver experiences as dynamic as the world around them.It takes a certain kind of person to join this team. Those who ask hard questions, give honest feedback, and show up for each other. No egos, no solo acts. Just smart, curious humans pushing toward something bigger, together.One Confluent. One Team. One Data Streaming Platform.About the Metrics Platform TeamThe Confluent Metrics Platform team's mission is to provide a best-in-class observability foundation that enables customers to monitor, analyze, and optimize their real-time data streaming infrastructure at cloud scale. Our charter is to deliver Realtime Metrics and Insights through the Confluent Cloud Metrics API, empowering businesses to make data-driven decisions about their streaming workloads.We are a critical component of Confluent's observability systems, serving as the primary interface through which customers understand the health, performance, and behavior of their Kafka clusters, connectors, ksqlDB applications, and Schema Registry deployments. Our technology powers monitoring dashboards, alerting systems, and capacity planning tools for thousands of customers running mission-critical streaming applications.About the RoleAs a Senior Manager, Engineering for the Metrics Platform team, you will build, lead, and grow a high-performing engineering organization responsible for one of Confluent's most critical services. This role demands a unique blend of deep technical expertise and strong leadership—you must drive both the strategic vision for a large-scale, real-time analytics platform AND execute flawlessly on operational excellence.Your immediate focus will be on:Scaling for Growth: Leading the technical strategy to scale our metrics infrastructure to handle 10x data volume over the next 2 yearsAPI Evolution: Driving the roadmap for new metrics datasets, query capabilities, and integration patternsOperational Excellence: Ensuring 99.99%+ availability, sub-second query performance, and seamless incident responseCross-Team Collaboration: Partnering with multiple teams across Telemetry, Cloud Infrastructure, and Product to deliver end-to-end observability solutionsWhat You Will DoTechnical Strategy & ArchitectureDefine and execute the multi-year technical roadmap for the Metrics Platform, including Data infrastructure cluster evolution, data retention strategies, and query optimizationBuild, mentor, and grow a world-class engineering teamPartner with Product Management to define and prioritize the Metrics API roadmap based on customer needs and business impactAlign with Confluent's broader observability strategy across Cloud and Platform offeringsEstablish metrics and KPIs to measure system performance, system reliability, and customer satisfactionWhat We're Looking For14+ years of overall experience in software development and engineering4+ years of engineering management experience, leading productive, high-performing teamsExperience operating large-scale distributed systems in production environments (preferably cloud-native)Leadership & Management SkillsDemonstrated ability to hire and retain top engineering talent, provide impactful coaching, and drive high-performance results.Proven track record of shipping features consistently and meeting aggressive deadlines with a high degree of urgency.Exceptional prioritization skills with the ability to balance short-term execution with a long-term strategic vision for technical evolution.Exceptional communication and collaboration skills, with a focus on building a positive, inclusive team culture aligned with organizational goals.Technical ExpertiseSolid fundamentals in distributed systems design, replication protocols, and high-availability production operations.Deep familiarity with Kafka or similar high-scale event streaming platforms (Pulsar, Flink, etc.) in cloud environments.Experience operating complex architectures across large public clouds (AWS, GCP, Azure) or private cloud-native infrastructures.Strong engineering background with a hands-on approach to technology and a passion for architectural deep-dives.Nice to HaveDirect experience with Apache Druid in production at scaleFamiliarity with Prometheus, OpenMetrics, or OpenTelemetry ecosystemsExperience in SaaS or platform engineering organizationsReady to build what's next? Let’s get in motion.Come As You AreBelonging isn’t a perk here. It’s the baseline. We work across time zones and backgrounds, knowing the best ideas come from different perspectives. And we make space for everyone to lead, grow, and challenge what’s possible.We’re proud to be an equal opportunity workplace. Employment decisions are based on job-related criteria, without regard to race, color, religion, sex, sexual orientation, gender identity, national origin, disability, veteran status, or any other classification protected by law.Privacy StatementConfluent is an IBM subsidiary which has been acquired by IBM and will be integrated into the IBM organization. By proceeding with this application, you understand that Confluent will share your personal information with other IBM affiliates involved in your recruitment process, wherever these are located. More Information on how IBM protects your personal information, including the safeguards in case of cross-border data transfer, are available here.
Apply Now
Apply Now