[Remote] Senior Site Reliability Engineer — Government & Sovereign Cloud
Note: The job is a remote job and is open to candidates in USA. Veeam Software is the Data and AI Trust Company, specializing in data resilience and security. The role involves building a global Site Reliability Engineering function for the Veeam Data Cloud, focusing on government and sovereign cloud environments, while ensuring high availability and fault tolerance.ResponsibilitiesGet up to speed on the full platform — all VDC workloads, dependencies, and risk areas. Much of this will happen through code, docs, and conversations rather than direct environment accessWork with SMEs across the org to fill knowledge gaps and build onboarding material for the teamWrite and maintain runbooks, architecture docs, and operational guidesDesign infrastructure for high availability and fault tolerance on Azure (including Azure Government)Define SLIs, SLOs, and error budgets where none exist todayRun incident response and blameless postmortems. Turn incidents into improvementsIdentify reliability risks across modern and legacy workloads and build practical remediation plans that work within compliance constraintsClose observability gaps — define instrumentation requirements and drive implementationSet alerting, telemetry, and monitoring standards with partner teamsBuild automation to reduce toil and support fleet managementParticipate in on-call rotationsWork with IaC, CI/CD, deployment automation, and config management — including in air-gapped or compliance-restricted environmentsBuild and maintain testing, canary deployment, and release validation pipelinesIntegrate chaos engineering and monitoring tools, adapting choices to meet regulatory requirementsWork across product, platform, security, legal, compliance, and operations teamsOwn problems end-to-end — identify gaps, drive solutions, don't wait for directionMentor other engineers and help spread SRE practices across the orgSkills7+ years in Software Engineering, with 3+ years in SRE, Platform Engineering, or similar — across multi-service platforms, not just single-service environmentsExperience with Government or Sovereign Cloud (e.g., Azure Government, AWS GovCloud)Experience in regulated compliance environments — government (FedRAMP, CMMC, IL2/IL4/IL5), financial (PCI-DSS, SOX), or healthcare (HIPAA, HITRUST). You understand how compliance shapes architecture and operationsStrong experience building and running production services on cloud infrastructure (Azure preferred, including Azure Government)Able to learn large, complex platforms quickly with limited guidance — comfortable building understanding from code, docs, and architecture artifacts when direct environment access is restrictedCan investigate systems independently and produce clear docs, risk assessments, and improvement plansComfortable working across teams — engineering, product, security, compliance, operationsProgramming skills in one or more of: TypeScript/JS, Go, Java, C#, or similarExperience with monitoring and observability tools (e.g., Prometheus, Grafana, OpenTelemetry, ELK stack)Experience with IaC (Terraform, Terragrunt, Pulumi) and container orchestration (Kubernetes)Experience with CI/CD and GitOps tooling — GitHub Actions, Azure DevOps, GitLab CI, ArgoCD, FluxCD, or DaggerSolid grasp of distributed systems, networking, and cloud-native architectureClear written and verbal communication skillsExperience on B2B SaaS platforms in regulated or government marketsBackground in chaos engineering, resilience testing, or performance/load testingHave built an SRE or reliability function from scratch beforeExperience across mixed environments — modern cloud-native and older legacy systemsFamiliar with AI-first development workflows — using LLM-powered tools for infrastructure automation, code generation, and documentationBenefitsUnlimited paid time off, 12 paid holidays, plus 4 extra global VeeaMe Days for self-care and 24 paid volunteer hours annually through Veeam CaresPaid parental leave: 8 weeks for all parents, 16 weeks for birthing parentsMedical, dental, and vision coverage starting on your first dayMental health support, therapy sessions, and digital wellness tools via our Employee Assistance Program401(k) retirement plan with company matching contributionsFertility, adoption, and surrogacy support through Maven, plus paid volunteer timeAirVet: 24/7 virtual veterinary care at no costLegal services, identity protection, and supplemental health insurance optionsTax-advantaged spending accounts for healthcare, dependent care, and commutingOpportunities to learn and grow through on-demand libraries (LinkedIn Learning, O’Reilly), mentoring, workshops, and learning events like our annual Global Day of LearningProfessional development resources including mentorship, training, and volunteer daysCompetitive compensation and benefitsCompany OverviewVeeam provides data resilience and data management solutions for cloud, virtual, and physical environments. It was founded in 2006, and is headquartered in Columbus, Ohio, USA, with a workforce of 5001-10000 employees. Its website is http://www.veeam.com.Company H1B SponsorshipVeeam Software has a track record of offering H1B sponsorships, with 2 in 2025, 1 in 2024, 2 in 2022. Please note that this does not guarantee sponsorship for this specific role.