[Remote] Order Management System (OMS) Staff Engineer
Note: The job is a remote job and is open to candidates in USA. Levi Strauss & Co. is a company that values individuality and making a positive impact. They are seeking a Staff Engineer for their Order Management System (OMS) team, responsible for leading the design and architecture of complex systems, ensuring engineering excellence, and promoting operational reliability.ResponsibilitiesLead the design and domain modeling of complex, distributed systems within the OMS ecosystem. This produces clear, well-reasoned service boundaries, data contracts, and event-driven interaction patterns that stand up to scrutiny and scaleChampion domain-driven design (DDD) principles, working with product and engineering peers to identify bounded contexts, eliminate implicit coupling, and surface shared language across teamsGuide decomposition of monolithic or tightly-coupled components into well-defined, independently deployable servicesâreducing blast radius, improving team autonomy, and promoting faster iterationAuthor architecture decision records (ADRs) and technical design documents that communicate the "why" alongside the "what," helping teams make decisions over timeWrite, review, and guide production-quality code with an emphasis on clarity, testability, and long-term maintainabilityâsetting the bar for engineering craft on the teamApply modern software engineering practices: CI/CD pipelines, automated testing strategies, feature flagging, progressive delivery, and trunk-based developmentIdentify and eliminate technical debt systematically, balancing short-term velocity with long-term system health through well-argued, incremental improvement plansEstablish and promote coding standards, patterns, and best practices across the OMS team that are practical, enforceable, and grounded in production experienceOperate with full production: you design with failure in mind, participate in on-call rotations, and take accountability for the health and reliability of the systems you shipEmbed reliability engineering into the development lifecycleâdefining SLOs, error budgets, and reliability targets upfront rather than as an afterthoughtTreat runbooks, strategies, and operational documentation as first-class engineering artifacts, keeping them accurate, applicable, and tightly coupled to the systems they describeDesign and implement comprehensive observability strategiesâstructured logging, distributed tracing, and metricsâso that you can localize any failure mode in productionDevelop dashboards that give engineers, on-call responders, and partners genuine operational insight into system healthânot just uptime pings, but meaningful golden signals and business-relevant GoalsDefine and tune alerting strategies that are signal-rich and noise-poorâensuring you wake on-call engineers for relevant events, not symptoms of unrelated upstream noiseChampion observability as a design constraint, ensuring you instrument new services and that you make telemetry quality part of every code review and launch checklistDesign systems that can sustain peak commercial volumesâseasonal traffic spikes, flash sales, and global expansionâwithout degraded experience or unplanned downtimeApply scalability patterns: asynchronous messaging, event sourcing, CQRS, caching strategies, database sharding, and graceful degradation, selecting the right tool for each problemConduct and lead capacity planning exercises, load testing, and performance profilingâtranslating production data into informed infrastructure and architectural decisionsBe the senior technical resource during complex production incidentsâmethodically narrowing hypotheses, leading war rooms, and restoring service while preserving forensic evidence for root cause analysisFacilitate blameless post-incident reviews (PIRs) that produce durable improvementsânot just immediate fixes, but systemic changes that reduce the likelihood or impact of recurrenceDevelop institutional troubleshooting knowledge: document failure modes, known issues, and diagnostic techniques so the entire team grows more capable with each incidentPartner with product managers, architects, and other engineers to translate our requirements into clear, achievable technical roadmapsâbridging the gap between strategy and implementationMentor and level up mid-level engineers through hands-on code review, design feedback, pairing sessions, and direct coachingâbuilding engineering depth across the OMS teamStay current with industry trends in distributed systems, event-driven architecture, and operational toolingâbringing informed perspectives on when to adopt new approaches versus doubling down on patternsSkills10+ years of experience in software engineering with a focus on backend systems, distributed architectures, and platform/product engineering at scaleDeep, practical experience designing and modeling complex distributed systemsâyou articulate trade-offs and make well-reasoned architectural choices under constraintsYou have experience operating in a 'you build it, you run it' engineering culture. You've been on-call for systems you've built, responded to incidents, and used that experience to make better engineering decisionsBuild for scale and run at scaleâyou've handled high-throughput, high-availability systems and have the scars and lessons to show for itExpert-level understanding of observability: you can instrument a system from scratch, build meaningful dashboards, tune alerting, and use telemetry data as a primary tool for engineering decisionsTroubleshoot with a systematic, data-driven approach to diagnosing production issuesâyou stay calm and lead others when systems are on fireDemonstrated experience decoupling tightly-coupled systemsâwhether migrating a monolith, extracting a shared service, or replacing implicit temporal dependencies with well-defined async contractsExperience with event-driven architecture, domain-driven design, and modern API design patterns; you know where these patterns add value and where they add unnecessary complexityMastery of CI/CD, automated testing, and DevOps practices; you view them as engineering fundamentals, not optional add-onsYou can translate technical complexity for non-technical partners and write for engineering audiencesâdesign docs, ADRs, incident reports, and code reviews all reflect your thinkingExperience working with geographically distributed teams and navigating the complexities of multi-time zone collaborationExperience with Order Management Systems (OMS), fulfillment pipelines, or commerce platforms is a meaningful plusâfamiliarity with the domain accelerates your impact, but is not a prerequisite for the right engineerBenefitsBase payIncentive plans401(k) matchingPaid leaveHealth insuranceProduct discountsCompany OverviewLevi Strauss & Co. is a brand-name apparel company designs, markets, and sells jeans, casual and dress pants, jackets, skirts, and more. It was founded in 1853, and is headquartered in San Francisco, California, USA, with a workforce of 10001+ employees. Its website is http://levistrauss.com/.Company H1B SponsorshipLevi Strauss & Co. has a track record of offering H1B sponsorships, with 8 in 2026, 37 in 2025, 42 in 2024, 49 in 2023, 76 in 2022, 59 in 2021, 39 in 2020. Please note that this does not guarantee sponsorship for this specific role.