[Remote] Site Reliability Engineer (SRE)
Note: The job is a remote job and is open to candidates in USA. EPAM Systems is a large Wealth Management firm seeking an experienced Site Reliability Engineer to support feature development on its newly built Trading Platform. The role involves implementing DevOps and SRE best practices, managing monitoring solutions, and collaborating with application teams to ensure performance and availability.ResponsibilitiesImplement and champion DevOps and SRE best practices across the organizationDrive technology roadmap discussions for the SRE teamDefine, craft, and maintain SLIs and SLOs, along with key metrics including MTTR, Lead Time for Change, Deployment Frequency, and Change Failure RateDesign, develop, and manage monitoring, alerting, and observability solutions using Dynatrace, Splunk, and GrafanaConduct performance assessments, identify bottlenecks, and recommend enhancements to improve system performancePartner with application teams to enforce performance and availability SLAsCollaborate with product owners to manage error budgets, prioritize toil backlogs, and validate against team, application, and incident metricsParticipate in an on-call rotation to respond to production events and outagesContinuously improve CI/CD pipelines and deployment processesLead troubleshooting efforts, incident management, and root cause analysisIdentify and build automated processes wherever possibleImplement cybersecurity measures through ongoing vulnerability assessments and risk managementProvide periodic progress reports to management and stakeholdersPartner with application teams to support and ease their adoption of the platformFacilitate clear coordination and communication within the team and with customersAnalyze existing systems and develop plans for enhancements and improvementsSkillsBachelor's degree in Computer Science or a related field, and/or equivalent work experience5+ years of experience working within DevOps or SRE teamsProven experience supporting production infrastructureStrong knowledge of CI/CD principles and pipelinesSolid understanding of observability concepts, including monitoring, logging, and tracingHands-on experience with Dynatrace and SplunkExperience with at least one major cloud provider (AWS, Azure, or GCP)Demonstrated experience operating high-availability, fault-tolerant, scalable, and distributed systems in productionCompany OverviewEPAM leverages its core engineering expertise as a leading global product development and digital platform engineering services company. It was founded in 1993, and is headquartered in Newtown, Pennsylvania, USA, with a workforce of 10001+ employees. Its website is https://www.epam.com.Company H1B SponsorshipEPAM Systems has a track record of offering H1B sponsorships, with 11 in 2026, 120 in 2025, 172 in 2024, 232 in 2023, 373 in 2022, 359 in 2021, 502 in 2020. Please note that this does not guarantee sponsorship for this specific role.