Staff Site Reliability Engineer

Remote Full-time
Who we are

At Domino, we build software that helps the largest, AI-driven organizations build and operate advanced data science and AI solutions at scale. Our platform integrates a streamlined model development environment, MLOps capabilities, and novel features for collaboration, reuse, and reproducibility β€” all of which make data science teams more productive, reduce time to value, and ensure compliance. Our customers β€” like Johnson & Johnson, GSK, Bristol Myers, UBS, FINRA and the US Navy β€” are using our software to solve some of the most important challenges in the world, such as developing new medicines, securing our financial markets, or protecting our country. Backed by Sequoia Capital, Coatue Management, NVIDIA, Snowflake, NetApp and other leading investors, we have been in business for a decade but are still a small team operating with the spirit of a startup. Especially in the world of AI today, we believe that the future is still being invented β€” and we want to be the ones building it. For more information, visit www.domino.ai

What we are building

As our infrastructure and customer footprint grow, we're investing in a new kind of SRE practice where the people who respond to incidents also build the systems that make future incidents shorter, rarer, and less painful. We're developing AI-assisted tooling that helps our support and engineering teams diagnose problems faster, learn from outages more deeply, and automate away the toil that slows everyone down. This role sits at the center of that: equal parts hands-on operator, software engineer, and technical leader. If you believe that operational experience and engineering craft make each other stronger, you'll feel right at home here.

What your impact will be

Lead the development of Domino's internal AI-assisted reliability tooling, including systems that analyze tickets, logs, traces, and documentation to help teams resolve outages faster with less recurring toil

Improve the observability coverage and signal quality for our most critical customer-facing systems, so engineers have more to work with throughout the development and support lifecycle

Own incident response end-to-end, from detection to remediation, and leave each problem space better documented, better understood, and less likely to recur

Guide the development of customer and user-facing observability tools within our products

Define and mature SLO/SLI frameworks for priority services, turning abstract reliability goals into measurable, actionable standards

Scale cloud operations practices for Domino’s single-tenant SaaS offering, and work with engineering teams to improve the reliability and repeatability of customer deployments and upgrades

Mentor other engineers and shape how SRE is practiced at Domino, including incident response workflows, operational readiness expectations, and post-incident learning culture

What we look for in this role

Deep experience in Site Reliability Engineering, platform engineering, or a software engineering role with genuine, hands-on operational ownership

Fluency with Kubernetes, Linux, cloud platforms, and observability tooling, and the ability to use them to investigate complex, real-world production problems

A strong ability to perceive and close reliability gaps in technical products, tools and processes

Strong software engineering skills in Python or Go, with a track record of building internal tools or services that people actually rely on

Comfort leading technically ambiguous work and influencing direction across teams without needing direct authority to get things done

A history of improving reliability through engineering and automation, not just putting out fires manually

Strong communication skills and real experience mentoring engineers or shaping technical decision-making on your team

Sound judgment about AI/LLM tooling: you know where it genuinely helps in operational workflows and where it adds noise instead of signal

Bonus: Experience with LLM-based systems, retrieval workflows, SaaS platform operations, or building tooling for support or developer teams

What we value

We strongly believe in the value of growing a diverse team and encourage people of all backgrounds, genders, ethnicities, abilities, and sexual orientations to apply

We value a growth mindset. High-performing creative individuals who dig into problems and see the opportunities for success

We believe in individuals who seek truth and speak the truth and can be their whole selves at work.

We value all of you that believe improving is always possible. At Domino, everything is a work in progress – we can do better at everything.

We emphasize an environment of teaching and learning to equip employees with the tools needed to be successful in their function and the company.

#LI-Remote
The annual US base salary range for this role is listed below. For sales roles, the range provided is the role's On Target Earnings ("OTE") range, meaning that the range includes both the sales commissions/sales bonuses target and annual base salary for the role. This salary range will be narrowed during the interview process based on a number of factors, including the candidate's experience, qualifications, and location. Additional benefits for this role may include: equity, company bonus or sales commissions/bonuses; 401(k) plan; medical, dental, and vision benefits; and wellness stipends.

Compensation Range
$200,000β€”$230,000 USD
Apply Now β†’

Similar Jobs

Experienced Registered Behavior Technician for In-Home ABA Therapy - Atlanta, GA

Remote

Immediate Hiring: Experienced Registered Behavioral Technician (RBT) for Clinic-Based ABA Therapy Services

Remote

Experienced Registered Behavioral Technician (RBT) - ABA Therapy for Children with Autism Spectrum Disorder

Remote

Experienced Registered Nurse - Telehealth: Providing Remote Care Coordination and Patient Support

Remote

Experienced Substitute Teacher for Riverside County Schools - Join Scoot Education's Innovative Team

Remote

Experienced Substitute Teacher for San Bernardino County - Flexible Schedules & Competitive Pay

Remote

Experienced School Year Instructional Coach for High-Dosage Tutoring Programs in Edgewater Park, NJ

Remote

Experienced School Year Tutor for K-8 Students in Math and Literacy - Mickleton, NJ

Remote

Experienced Secondary Social Studies Teacher for Kansas - Flexible Hybrid Remote Arrangement

Remote

USPS Office Helper

Remote

Senior Sales Support Specialist

Remote

Apply Now: Looking for Math Teacher: After-School/Grades

Remote

Experienced Remote Customer Service Representative Wanted for Dynamic Team – Competitive Hourly Rate and Opportunities for Growth

Remote

Remote Engineer

Remote

Yahoo Mail Data Entry Jobs (Call Support, Customer Help) $32/Hour

Remote

[PART_TIME Remote] Hiring Immediately - Work From Home - Sales

Remote

Junior Data Analyst/Engineer/Scientist - Remote

Remote

Experienced Virtual Data Entry Specialist – Part-Time Remote Opportunity with Flexible Hours and Comprehensive Benefits at arenaflex

Remote

Tech Lead, Android Core Product - St. Petersburg, FL, USA

Remote

Senior PV Operations Specialist

Remote
← Back