[Remote] Sr Site Reliability Engineer

Remote Full-time
Note: The job is a remote job and is open to candidates in USA. Commence is a company focused on data-centric transformation in healthcare, aiming to improve health outcomes through efficient processes. They are seeking a Senior Site Reliability Engineer to ensure the reliability and operational health of their healthcare data platform, collaborating with engineering teams and managing incident responses.ResponsibilitiesDesign, implement, and own observability infrastructure including metrics, logging, tracing, and alerting across distributed systemsDefine and enforce SLOs, SLIs, and error budgets in partnership with product and engineering teamsLead incident response: triage, coordinate remediation, conduct blameless post-mortems, and drive systemic fixesBuild and maintain CI/CD pipelines that support rapid, safe delivery of changes to productionCollaborate with engineering teams on infrastructure changes; able to read, modify, and contribute to existing infrastructure-as-code (Terraform or CloudFormation)Design and operate highly available, fault-tolerant systems—including auto-scaling, failover, and disaster recovery strategiesReduce operational toil through automation; eliminate manual processes before they become habitsCollaborate with software engineers to establish reliability-first design patterns and review architectures for operational riskManage Kubernetes or container orchestration environments at scaleEnsure systems meet compliance and security requirements, particularly those applicable to healthcare data (HIPAA, SOC 2)Provide technical mentorship and guidance to engineers across the organization on reliability practicesParticipate in on-call rotation with a commitment to continuously reducing the need for itSkills7+ years of experience in SRE, platform engineering, or DevOps rolesExceptional problem-solving under pressure—demonstrated track record of diagnosing complex, high-stakes system failures and building durable solutionsDeep hands-on experience with AWS services including EC2, EKS/ECS, Lambda, RDS, S3, CloudWatch, and related toolingFamiliarity with infrastructure-as-code (Terraform or CloudFormation)—able to contribute to existing configurationsExperience designing and operating distributed systems with strict availability and latency requirementsProficiency in at least one scripting or systems language (Python, Go, Bash, or similar) for automation and toolingExperience with container orchestration (Kubernetes, ECS) in production environmentsExpertise in observability tooling (OpenSearch, Prometheus/Grafana, or equivalent)Hands-on experience with CI/CD platforms (GitHub Actions, Jenkins, CircleCI, or similar)Proven ability to define and operationalize SLOs and error budgetsExperience with relational and NoSQL databases—performance tuning, replication, and backup strategiesStrong working knowledge of networking fundamentals: DNS, load balancing, VPCs, TLSExcellent communication skills—able to translate technical risk into business impact for non-engineering stakeholdersAWS Certifications (Solutions Architect, DevOps Engineer, or SysOps Administrator)Experience in healthcare technology or other regulated industries (HIPAA, SOC 2, FedRAMP)Familiarity with chaos engineering practices and toolingExperience with data pipeline reliability (ETL/ELT workflows, streaming systems)Exposure to AI/ML infrastructure and the reliability challenges unique to model servingFamiliarity with additional cloud platforms (Azure, Google Cloud)Contributions to open-source reliability or infrastructure toolingCompany OverviewCommence delivers AI-driven healthcare data platform and clinical expertise that supports analytics, decisions, and workflow improvement. It was founded in undefined, and is headquartered in Virginia Beach, Virginia, USA, with a workforce of 501-1000 employees. Its website is https://commence.ai.

Apply Now →

Similar Jobs

Experienced Registered Behavior Technician for In-Home ABA Therapy - Atlanta, GA

Remote

Immediate Hiring: Experienced Registered Behavioral Technician (RBT) for Clinic-Based ABA Therapy Services

Remote

Experienced Registered Behavioral Technician (RBT) - ABA Therapy for Children with Autism Spectrum Disorder

Remote

Experienced Registered Nurse - Telehealth: Providing Remote Care Coordination and Patient Support

Remote

Experienced Substitute Teacher for Riverside County Schools - Join Scoot Education's Innovative Team

Remote

Experienced Substitute Teacher for San Bernardino County - Flexible Schedules & Competitive Pay

Remote

Experienced School Year Instructional Coach for High-Dosage Tutoring Programs in Edgewater Park, NJ

Remote

Experienced School Year Tutor for K-8 Students in Math and Literacy - Mickleton, NJ

Remote

Experienced Secondary Social Studies Teacher for Kansas - Flexible Hybrid Remote Arrangement

Remote

USPS Office Helper

Remote

Entry - Level Remote Customer Chat Support Specialist

Remote

**Experienced Customer Service Officer – Delivering Exceptional Guest Experiences at arenaflex**

Remote

Remote Customer Support Specialist – Pet‑Care Enthusiast for Chewy’s Online Retail Team (Work‑From‑Home, Full‑Time)

Remote

Provider Data Services Senior Coordinator

Remote

Experienced Remote Chat and Email Support Representative – Delivering Exceptional Customer Service through Multi-Channel Interactions

Remote

Experienced Remote Data Entry Clerk - Flexible Hours at blithequark

Remote

Remote NP- Clinical Review & Oversight

Remote

Part Time Ramp Agent - Southwest Airlines - California, USA - $20/Hour

Remote

**Experienced Live Chat Support Agent for arenaflex: Flexible, Part-Time Opportunity for Moms**

Remote

Bilingual French and English Customer Service Representative - Work from Home Opportunity with arenaflex

Remote
← Back