Site Reliability Engineer, K8s

Remote Full-time
WebMD and its affiliates is an Equal Opportunity/Affirmative Action employer and does not discriminate on the basis of race, ancestry, color, religion, sex, gender, age, marital status, sexual orientation, gender identity, national origin, medical condition, disability, veterans status, or any other basis protected by law.
Position Overview
Our BI team runs a set of GCP-based APIs and data services that a lot of internal products depend on. As we've grown, keeping things running has increasingly been a side responsibility for engineers who are primarily building features — and that's not sustainable. We're looking for an SRE to own that space: service health, incident response, infrastructure monitoring, and making sure we're not blindly burning cloud budget.
The Site Reliability Engineer will ensure the availability, performance, and security of the Business Intelligence team's GCP-hosted APIs and data infrastructure. This role is responsible for proactive monitoring, incident response, and continuous improvement of platform reliability across a cloud-native stack. The engineer will work closely with backend and data engineers to maintain service health and drive operational excellence. This position also carries responsibility for GCP cost visibility, helping the team track and optimize cloud spend through structured monitoring and alerting.
Responsibilities
Monitor and maintain uptime of GCP-hosted APIs and services, keeping performance within agreed targets

Lead incident response for BI platform services — triage, resolve, and follow up with post-mortems that actually prevent recurrence

Build and manage observability infrastructure: dashboards, alerts, and logging across GCP services

Track GCP cloud spend and set up cost alerting to flag anomalies before they become problems

Review and fix security gaps — IAP configs, service account permissions, API access controls

Work with data and backend engineers to shore up reliability of data pipelines and BigQuery workflows

Contribute to infrastructure-as-code and help keep deployments documented and reproducible

Qualifications
2+ years in a Site Reliability, DevOps, or Cloud Infrastructure role in a production environment

Bachelor's degree in Computer Science, Engineering, or related field, or equivalent hands-on experience

Practical experience with GCP — Cloud Run, API Gateway, and BigQuery in particular

Experience with monitoring and observability tooling (Cloud Monitoring, Datadog, or similar)

Solid grasp of cloud security fundamentals — IAM, network controls, access management

Proficiency with Git and version control in a team setting

Please list the preferred skills here:
CI/CD pipelines and deployment automation (GitHub Actions, Cloud Build, or similar)

Terraform or other infrastructure-as-code tools

Python for scripting or automation

MySQL, Spanner, or BigQuery at any meaningful depth

GCP cost management and spend optimization

Experience with dbt or Looker

Comfortable working across CET/EST hours in a distributed team

Apply To This Job
Apply Now →

Similar Jobs

Experienced Registered Behavior Technician for In-Home ABA Therapy - Atlanta, GA

Remote

Immediate Hiring: Experienced Registered Behavioral Technician (RBT) for Clinic-Based ABA Therapy Services

Remote

Experienced Registered Behavioral Technician (RBT) - ABA Therapy for Children with Autism Spectrum Disorder

Remote

Experienced Registered Nurse - Telehealth: Providing Remote Care Coordination and Patient Support

Remote

Experienced Substitute Teacher for Riverside County Schools - Join Scoot Education's Innovative Team

Remote

Experienced Substitute Teacher for San Bernardino County - Flexible Schedules & Competitive Pay

Remote

Experienced School Year Instructional Coach for High-Dosage Tutoring Programs in Edgewater Park, NJ

Remote

Experienced School Year Tutor for K-8 Students in Math and Literacy - Mickleton, NJ

Remote

Experienced Secondary Social Studies Teacher for Kansas - Flexible Hybrid Remote Arrangement

Remote

USPS Office Helper

Remote

Billing Specialist - Audit/ Claims Review

Remote

Associate, Financial Reporting & Audits

Remote

Senior Provider Relations Representative - CGS (Medicare Part B)

Remote

Part Time Shop Mechanic Tallahassee, FL

Remote

Strategic Communications Specialist - Remote Opportunity

Remote

Case Management Nurse, Multiple Locations

Remote

Apply Now: Fully Remote Therapist

Remote

Travel Nurse RN - Home Health - $1,966 to $2,369 per week in Honolulu, HI

Remote

AVP, Product Manager, Client & Consumer Portals

Remote

**Experienced Part Time Data Entry Specialist - Purchasing: Join arenaflex in Revolutionizing the Branded Swag Supply Chain**

Remote
← Back