Staff Production Operations Engineer

Remote Full-time
About the Role:
We're looking for a Staff Production Operations Engineer to drive Redpanda's reliability operations program. This role combines hands-on site reliability engineering with planning and coordination skills to ensure a world-class operations practice across a globally distributed engineering team.
In this role, you'll work with the broader Engineering team, Engineering leadership, Product and Customer Success to drive operational excellence. You'll coordinate our on-call and incident lead rotations, drive blameless post-incident reviews, and own the processes that help us respond faster, learn more from outages, and systematically improve reliability. We're looking for someone who can leverage AI agents to automate the operational toil that slows teams down, building on Redpanda's own ADP platform to do it.
You Will:
Drive process improvements across the incident lifecycle: severity models, triage enforcement, alert noise reduction, and follow-up completion rates

Coordinate the on-call program across multiple geographies: manage schedules and shadow rotations, onboard new engineers, and ensure consistent coverage

Select incidents for post-incident review, facilitate blameless post-incident reviews, document findings, and track follow-up completion. Contribute to addressing incident follow-ups where possible, either by fixing issues directly or prototyping solutions

Build AI agents to automate operational toil, including oncall automation, as well as incident summarization, post-incident reviews prep, follow-up tracking, and on-call analytics

Maintain runbooks, playbooks, and incident process documentation, and keep them current as processes evolve

You Have:
5+ years of experience in site reliability engineering, DevOps, or production operations in large-scale, highly reliable environments

A track record of leading initiatives end-to-end, from design and planning, to execution and production operation

Hands-on experience with incident management tooling (incident.io, PagerDuty, or similar) and observability stacks (Datadog, Grafana, Sentry, CloudWatch, or equivalent)

Strong Fluency with reliability concepts: MTTD, MTTR, MTTA, error budgets, SLOs

Experience building automation and tooling to reduce operational toil

Proficiency in Go (or comparable systems language with willingness to ramp)

Experience with AI-assisted software development workflows including tools like Claude Code

Working knowledge of at least one of AWS / Azure / GCP, including infrastructure as code for system and network infrastructure

Strong written communication; ability to drive alignment across engineering teams without direct authority

Nice to Have:
Hands-on experience building agents or automations using LLMs

Familiarity with Redpanda, Apache Kafka, or other streaming infrastructure

Prior experience in a fast-growing B2B infrastructure or developer tools company


U.S. base salary range for this role is $220,000 - $256,000 (CA, NY, WA) and $211,000 - $250,000 (other US locations). Our salary ranges are determined by role, level, and location. We strive to consider each candidate's job-related skills, location, experience, relevant education or training to determine individual base salary. Your talent partner will share more about the specific salary range for your preferred location during the hiring process.
Please note that Redpanda uses artificial intelligence (AI) technology to assist in the screening and assessment of applications for this position. However, all final hiring decisions are made by our human hiring team.
Vacancy Status: This job posting is for an existing vacancy.
Apply Now →

Similar Jobs

Experienced Registered Behavior Technician for In-Home ABA Therapy - Atlanta, GA

Remote

Immediate Hiring: Experienced Registered Behavioral Technician (RBT) for Clinic-Based ABA Therapy Services

Remote

Experienced Registered Behavioral Technician (RBT) - ABA Therapy for Children with Autism Spectrum Disorder

Remote

Experienced Registered Nurse - Telehealth: Providing Remote Care Coordination and Patient Support

Remote

Experienced Substitute Teacher for Riverside County Schools - Join Scoot Education's Innovative Team

Remote

Experienced Substitute Teacher for San Bernardino County - Flexible Schedules & Competitive Pay

Remote

Experienced School Year Instructional Coach for High-Dosage Tutoring Programs in Edgewater Park, NJ

Remote

Experienced School Year Tutor for K-8 Students in Math and Literacy - Mickleton, NJ

Remote

Experienced Secondary Social Studies Teacher for Kansas - Flexible Hybrid Remote Arrangement

Remote

USPS Office Helper

Remote

Behavior Health Clinician | Remote | Contract

Remote

iOS Software Engineer, Member Engineering [Remote]

Remote

Sr Coordinator, Meeting Coordination

Remote

UNICEF Pacific Consultancy: WASH Project Coordinator, Majuro, Republic of Marshall Islands, 22 Months

Remote

Content Moderator Careers – Maintain Digital Standards with Online Positions at $25-$35 Per Hour

Remote

Amazon DSP Delivery Driver

Remote

[Hiring] Appeals Pharmacist @Pharmacy Careers

Remote

Entry Level Outside Sales Representative

Remote

[FULL TIME Remote] Burger King Crew Member/Cashier

Remote

Historical Society Associate Director

Remote
← Back