[Remote] Staff Site Reliability Engineer, Production Engineering
Note: The job is a remote job and is open to candidates in USA. Dropbox is seeking a Staff Site Reliability Engineer to focus on company-wide reliability strategy. The role involves defining reliability strategies, leading initiatives to reduce risks, and collaborating with various teams to enhance operational excellence.ResponsibilitiesDefine and evolve Dropbox’s company-wide technical reliability strategy to support the changing engineering environment created by AI-assisted and agentic software developmentSet multi-year reliability goals, standards, and roadmaps across observability, debugging, incident management, service health, and operational readinessLead cross-team initiatives that reduce reliability risk as software delivery velocity, pull request volume, service complexity, and incident volume increasePartner with engineering leaders and platform teams to improve monitoring, alerting, debugging, SLOs, SLAs, and incident response systems at company scaleIdentify emerging reliability risks introduced by AI-enabled development workflows and design scalable systems, processes, and guardrails to mitigate themProvide technical leadership and mentorship to engineers across teams, raising engineering quality, reliability judgment, and operational excellenceDrive clear communication and alignment with senior stakeholders on reliability priorities, tradeoffs, risks, and execution progressSkillsBS degree in Computer Science or related technical field involving coding (e.g., physics or mathematics), or equivalent technical experience12+ years of experience in software engineering, site reliability engineering, infrastructure engineering, or related technical rolesProven ability to define and deliver multi-year, multi-team reliability, infrastructure, or platform strategies with measurable business and customer impactDeep experience with distributed systems, production operations, observability, incident response, SLOs/SLAs, debugging, and reliability risk managementDemonstrated ability to diagnose complex technical problems, debug production systems, automate operational workflows, and design resilient software componentsExperience influencing engineering roadmaps across multiple teams and making technical decisions that optimize for the broader engineering organizationStrong communication and collaboration skills, with the ability to align cross-functional stakeholders through ambiguity and drive execution across teamsExperience adapting reliability strategies, developer tooling, or operational processes for AI-assisted software development workflowsExperience building or scaling observability, debugging, incident management, or developer productivity platforms for large engineering organizationsExperience leading reliability improvements in environments with high deployment velocity, complex service dependencies, and large-scale production systemsTrack record of mentoring senior engineers, setting technical standards, and spreading reliability best practices through documentation, reviews, talks, or architecture guidanceFamiliarity with AI-enabled tooling, agentic development workflows, or operational risks introduced by rapid automation in the software development lifecycleCompany OverviewDropbox is a smart workspace company that provides secure file sharing, collaboration, and storage solutions. It was founded in 2007, and is headquartered in San Francisco, California, USA, with a workforce of 1001-5000 employees. Its website is https://www.dropbox.com.Company H1B SponsorshipDropbox has a track record of offering H1B sponsorships, with 13 in 2026, 121 in 2025, 105 in 2024, 103 in 2023, 166 in 2022, 197 in 2021, 157 in 2020. Please note that this does not guarantee sponsorship for this specific role.