[Remote] Site Reliability Engineer (SRE) - Azure | DevSecOps | IaC | Governance | Observability Job Details | Avaya
Note: The job is a remote job and is open to candidates in USA. Avaya is an enterprise software leader that helps the world’s largest organizations and government agencies forge unbreakable connections. They are seeking a Site Reliability Engineer (SRE) to drive stability, reliability, and performance across their Azure and GCP-based platforms, focusing on operational excellence and proactive incident management.ResponsibilitiesServe as a key member of the 24×7 on-call rotation, responding to and managing incidents across production and pre-production environmentsLead incident bridges, coordinate root cause analysis (RCA), and ensure post-incident reviews drive systemic improvementsMaintain clear communication with cross-functional teams and leadership during major incidentsBuild, tune, and maintain observability dashboards (Azure Monitor, GCP Operations Suite, Prometheus, Grafana, Datadog, Log Analytics)Perform deep-dive troubleshooting of application and service-level issues using distributed tracing and log analysis (Grafana, Datadog) to pinpoint root causes beyond infrastructureDefine SLOs, SLIs, and error budgets to proactively identify and mitigate reliability risks before customer impactIntegrate AI-Ops tools for anomaly detection, predictive alerting, and automated incident correlationContinuously enhance alert quality, reduce false positives, and automate runbooks for faster recoveryAnalyze trends to prevent recurring issues and support teams in resilience engineeringSkills5+ years in Site Reliability, DevOps, Cloud Operations, or Customer support rolesDemonstrated experience in application-level troubleshooting by analyzing logs and traces to identify bugs, performance bottlenecks, and error conditionsExpertise in Azure and GCP cloud operations and distributed system reliabilityUnderstanding of Terraform, Ansible, and CI/CD pipelines (Jenkins, GitHub Actions)Experience with observability and AI-Ops tools (Azure Monitor, GCP Operations Suite, Grafana, Prometheus, Datadog, etc.)Solid grasp of incident management frameworks (P1–P3 handling, RCA, PIRs, on-call rotations)Excellent analytical, troubleshooting, and communication skillsBenefitsPerformance-related bonusAnnual bonus that aligns with individual and company performanceBenefitsCompany OverviewAvaya is a global leader in enterprise communications, hybrid cloud CCaaS and UC solutions for mission-critical, AI-agnostic workflows. It was founded in 2000, and is headquartered in Morristown, New Jersey, USA, with a workforce of 5001-10000 employees. Its website is http://www.avaya.com.