[Remote] Site Reliability Engineering Manager
Note: The job is a remote job and is open to candidates in USA. Dice is seeking a Senior Manager of Site Reliability Engineering (SRE) to enhance SRE practices within the Financial Services & Innovation organization. This role involves establishing operational discipline, driving SRE standards, and ensuring alignment across teams to improve reliability and performance.ResponsibilitiesDrive adoption of the SRE operating model across application teamsEstablish clarity in roles between:SREProduction Support Engineering (PSE)Application teamsEnsure SRE practices are embedded into the development lifecycle, not treated as post-production activitiesDefine and enforce:SLIs, SLOs, and Error BudgetsProduction readiness criteriaReliability best practicesLead SLO adoption and compliance reviews across the organizationEstablish governance frameworks to ensure consistent application of standardsPartner with:Application product teamsProduction Support Engineering (MG team)Platform / Infrastructure / Observability teamsDrive alignment and reduce friction between engineering and operationsEnsure clear handoffs, escalation models, and operational ownershipLead adoption of centralized observability standards across:MetricsLoggingTracingAlign tooling (AppDynamics, Splunk, Prometheus, etc.)Ensure monitoring and alerting are SLO-driven and actionable, not noise-basedPartner with PSE to strengthen:Incident management processesRCA (Root Cause Analysis) standardsDrive identification of patterns and systemic issuesEnsure learnings translate into engineering improvements and automationIdentify opportunities to:Reduce manual operational workImprove system resilienceEnable self-healing capabilitiesPromote a culture of engineering over reactionDefine and track reliability metrics across FS&IBuild reporting that provides visibility into:System healthIncident trendsSLO performanceTranslate technical data into actionable business insightsSkills10+ years in engineering, operations, or SRE roles5+ years leading SRE, platform, or reliability-focused teamsProven experience implementing SRE practices at scale (SLIs, SLOs, error budgets)Strong background in cloud environments (AWS, Azure, Google Cloud Platform)Hands-on experience with observability tools (Splunk, AppDynamics, Prometheus, etc.)Experience in incident management and production operations at scaleAbility to operate effectively in high-pressure and complex enterprise environmentsExperience driving organizational transformation (not just technical implementation)Strong understanding of CI/CD, DevOps, and automation practicesExperience working in regulated or large enterprise environmentsFamiliarity with AIOps or advanced automation strategiesCompany OverviewDice is the go-to career marketplace for tech professionals. It was founded in 2010, and is headquartered in Drachten, Friesland, NLD, with a workforce of 201-500 employees. Its website is https://www.or-quest.nl/.Company H1B SponsorshipDice has a track record of offering H1B sponsorships, with 2 in 2022, 4 in 2021, 5 in 2020. Please note that this does not guarantee sponsorship for this specific role.