[Remote] Site Reliability Engineering Tech Lead
Note: The job is a remote job and is open to candidates in USA. DataHub is an AI & Data Context Platform adopted by over 3,000 enterprises, including major companies like Apple and Netflix. They are seeking an experienced Site Reliability Engineering (SRE) Tech Lead to drive the reliability, scalability, and operational excellence of their platform offerings, focusing on technical leadership and architecture, enterprise platform development, and platform reliability operations.ResponsibilitiesDesign and implement robust, scalable infrastructure solutions for DataHub Cloud and enterprise deploymentsLead the technical vision for multi-cloud deployment strategies and distributed system integrationsArchitect monitoring, observability, and alerting systems across diverse environmentsDrive best practices for infrastructure as code, configuration management, and deployment automationPartner with product and engineering teams to influence the development of advanced deployment capabilitiesCollaborate with cross-functional teams to help build systems for seamless installation, upgrade, and rollback processes across various environmentsInfluence the design and help implement comprehensive monitoring and health check systems for distributed deploymentsPartner with engineering teams to help develop self-healing and automated remediation capabilitiesEstablish and maintain SLAs/SLOs for both cloud and enterprise offeringsLead incident response and post-mortem processes to drive continuous improvementImplement chaos engineering practices to proactively identify system weaknessesOptimize system performance, capacity planning, and cost efficiencyMentor and guide a team of SRE engineers and collaborate with platform engineering teamsWork closely with product, engineering, and customer success teams to ensure reliable product deliveryImprove on-call practices, runbooks, and knowledge sharing processesDrive cross-functional initiatives to improve overall system reliabilitySkills8+ years of experience in Site Reliability Engineering, Platform Engineering, or DevOps roles3+ years of technical leadership experience managing engineering teamsStrong expertise with cloud platforms (AWS, GCP, Azure) and infrastructure automation toolsProficiency in containerization technologies (Docker, Kubernetes) and orchestrationExperience with infrastructure as code tools (Terraform, CloudFormation, Pulumi)Strong programming skills in Python, Java, or similar languagesDeep understanding of monitoring and observability tools (Prometheus, Grafana, Datadog, etc.)Experience with CI/CD pipelines and deployment automationStrong knowledge of networking, security, and database operations in cloud environmentsExperience building and operating multi-tenant SaaS platformsBackground in developing customer-facing deployment and management toolsKnowledge of data infrastructure and metadata management systemsExperience with service mesh technologies and microservices architecturesPrevious experience in a customer-facing technical role or working with enterprise clientsExperience with data governance or data catalog platformsBenefitsCompetitive compensationEquity for everyoneRemote WorkLocation flexibilityYouâll receive a monthly coworking stipend to use whenever you need a change of pace or in-person collaboration time.Comprehensive health coverageWe cover 99% of medical, dental, and vision premiums employees, and 65% for dependents.Flexible savings accountsWe offer FSAs to help cover planned or unexpected healthcare costs.You can also opt into a Dependent Care FSA to support family needs.Support for every path to parenthoodThrough Carrot Fertility, we provide inclusive fertility benefits and family-forming support.All U.S. employees have access, regardless of age, gender identity, or family structure.Time off that works for youOur unlimited PTO and sick leave policy is designed for flexibility, rest, and real life.Company OverviewDataHub is an open-source metadata platform that unifies data discovery, observability, and governance for AI and data ecosystems. It was founded in 2021, and is headquartered in Palo Alto, California, USA, with a workforce of 51-200 employees. Its website is https://datahub.com.Company H1B SponsorshipDataHub has a track record of offering H1B sponsorships, with 3 in 2025, 1 in 2024, 2 in 2021. Please note that this does not guarantee sponsorship for this specific role.