[Remote] Lead Site Reliability Engineer
Note: The job is a remote job and is open to candidates in USA. BillingPlatform is an industry-leading, fast-growing SaaS company that offers a cloud-based revenue lifecycle management platform. They are seeking a Lead Site Reliability Engineer to own and improve on-call processes, manage SLOs, and enhance system reliability through various engineering practices.ResponsibilitiesOwn and improve on-call processes, incident response playbooks, and post-mortem cultureDefine, track, and manage SLOs, SLIs, and error budgets for critical servicesLead blameless post-mortems and drive systematic reliability improvementsRespond to production incidents and coordinate cross-functional resolutionDesign, build, and maintain scalable AWS infrastructure using IaC (Terraform, Pulumi)Manage Kubernetes clusters and containerized workloads in productionBuild and maintain CI/CD pipelines to improve deployment speed and reliabilityEvaluate and implement tooling to enhance developer productivity and system stabilityImplement monitoring, alerting, and distributed tracing (Prometheus, Grafana, Datadog, Jaeger)Identify and resolve performance bottlenecks across services, networks, and databasesBuild dashboards and runbooks for self-service operational insightsPartner with engineering teams to embed reliability practices (load testing, capacity planning, chaos engineering)Conduct architecture reviews with a focus on reliability and operabilitySkills5+ years of experience in SRE, DevOps, or infrastructure engineeringDeep expertise with AWS and cloud-native architecturesStrong experience with Kubernetes and container orchestration at scaleHands-on experience with infrastructure-as-code tools (Terraform or Pulumi)Proficiency in Python, Go, or BashExperience with observability tools (Prometheus, Grafana, Datadog, or similar)Strong understanding of SLOs, SLIs, and error budgetsExperience with service mesh technologies (Istio, Linkerd)Familiarity with chaos engineering tools (Chaos Monkey, Gremlin, LitmusChaos)Background in Oracle database reliability and administrationContributions to open-source infrastructure projectsExperience in a high-growth SaaS or product-led environmentExcellent English communication skills (written and spoken)BenefitsCompetitive compensation with a robust benefits package, including medical, dental, vision, LTD, HSA, FSA, free virtual mental health counseling, and health and wellness perksMedical insurance coverage effective on the first day of employment401(k) match that is 100% immediately vestedDiscretionary and charitable time off programHome office setup allowance for fully remote employeesCompany OverviewBillingPlatform provides a cloud-based platform that helps businesses manage and automate their billing processes. It was founded in 2012, and is headquartered in Centennial, Colorado, USA, with a workforce of 201-500 employees. Its website is http://www.billingplatform.com.Company H1B SponsorshipBillingPlatform has a track record of offering H1B sponsorships, with 3 in 2025, 1 in 2024, 3 in 2022. Please note that this does not guarantee sponsorship for this specific role.