SRE Lead Platform Engineer Dynatrace & Azure - Fully remote

Remote Full-time
Job Title: SRE Lead Platform Engineer- Remote
Duration: 6 Months to Hire
Location: Fully remote, EST

The key skills for this Lead SRE Platform Engineer role are observability and monitoring (MELT data) using tools like Dynatrace, Datadog, and SCOM, strong Azure cloud and hybrid infrastructure knowledge, and DevOps automation with CI/CD, GitHub, and Terraform. The role also requires programming for automation (Python, C#, SQL) and strong experience with incident management, root cause analysis, and reliability engineering practices. At a lead level, the focus is on defining monitoring standards, improving system reliability, and guiding cross-team efforts to reduce outages and improve platform performance.
Dynatrace
Datadog
Microsoft SCOM

A typical day for this engineer would be a mix of monitoring system health, investigating reliability issues, improving observability, and leading automation and infrastructure improvements.

Role Summary
As a Lead SRE Platform Engineer, you will drive reliability engineering strategy and execution across critical IT Business Solutions platforms at Wegmans. This role focuses on improving uptime, performance, and operational efficiency through software enhancements, observability, automation, and data-driven root cause analysis (RCA).
You will serve as the technical lead for SRE practices establishing monitoring standards, improving MELT (Metrics, Events, Logs, Traces) strategy, influencing tooling decisions, and partnering across infrastructure, development, operations, and vendor teams. This is a high-impact opportunity to build and mature reliability engineering capabilities from the ground up.

What You ll Do
Reliability & Observability Leadership
Define and mature SRE best practices across cloud and on-prem environments.
Design and implement comprehensive monitoring strategies using tools such as:
o Dynatrace
o Datadog
o Microsoft SCOM
Develop dashboards, alerts, synthetic testing, and proactive monitoring capabilities.
Establish and evolve a MELT data strategy to improve service reliability.
Provide data-driven RCA investigations and implement preventative solutions.

Platform & Application Reliability
Support and enhance reliability across:
Cloud & Infrastructure
o Microsoft Azure (software, storage, Azure local)
o Hyper-V and legacy VMware environments
o NetApp and Pure storage platforms
o Azure log analytics
o Infrastructure as Code using Terraform
o Migration from Azure DevOps to GitHub (strong GitHub experience required)
Order Management Systems
o Azure-based, internally developed .NET/C# applications
o Internal message queuing systems
o Logging, analytics, and synthetic testing post-patching
o API-based integrations
Workforce & Payroll Platforms
o Workday (Payroll)
o ADP Vantage (Timekeeping)
Warehouse & Distribution Systems
o Blue Yonder Warehouse Management System (WMS)
o Vocollect handheld voice picking devices
o Network analytics for identifying dead zones and connectivity issues
o Barcode scanners and device connectivity troubleshooting

DevSecOps & Automation
Lead CI/CD reliability improvements (Azure DevOps GitHub transition critical).
Enhance pipeline automation with embedded security controls.
Advance Infrastructure-as-Code standards (Terraform).
Improve configuration management and change governance.
Drive automation to reduce manual intervention and operational risk.

ITSM & Incident Management
Work within BMC ecosystem including:
o BMC Helix
o BMC Remedy
o BMC Server Automation
Optimize automated incident generation (SCOM BMC workflows).
Improve triage, escalation, and impact modeling across services.
Monitor vendor performance and escalate appropriately.
Participate in off-hour escalation support when required.

Strategic Impact
Develop predictive reliability models using statistical techniques.
Identify systemic risk across production systems.
Guide tooling decisions (e.g., Dynatrace vs. Datadog or other observability platforms).
Ensure regulatory and operational compliance standards are met.
Facilitate cross-functional collaboration and document SRE procedures and planning artifacts.

Required Qualifications
5 7+ years of Software Engineering and Infrastructure/Database Engineering experience.
Deep expertise in:
o DevSecOps practices
o Observability platforms
o API integrations
o Performance management tools
o ITIL principles
o ITSM data analytics
o MELT data collection and analysis
Experience in Azure cloud environments.
Strong analytical and problem-solving skills.
Demonstrated ability to influence technical direction.
Excellent communication and cross-team collaboration skills.
Continuous improvement mindset focused on reliability engineering.

Preferred Qualifications
Strong programming experience in:
o .NET / C#
o Python
o SQL
Experience with MSSQL (primary) and Oracle (limited).
Experience with GitHub (critical for upcoming transition).
Agile/Scrum experience.
Knowledge of Reliability-Centered Engineering and maintenance strategies.
Experience with synthetic testing and proactive validation post-deployment.
Bachelor s degree in a related technical field.

Thank you,
Shiva Mittal

Apply tot his job

Apply To this Job
Apply Now →

Similar Jobs

Experienced Registered Behavior Technician for In-Home ABA Therapy - Atlanta, GA

Remote

Immediate Hiring: Experienced Registered Behavioral Technician (RBT) for Clinic-Based ABA Therapy Services

Remote

Experienced Registered Behavioral Technician (RBT) - ABA Therapy for Children with Autism Spectrum Disorder

Remote

Experienced Registered Nurse - Telehealth: Providing Remote Care Coordination and Patient Support

Remote

Experienced Substitute Teacher for Riverside County Schools - Join Scoot Education's Innovative Team

Remote

Experienced Substitute Teacher for San Bernardino County - Flexible Schedules & Competitive Pay

Remote

Experienced School Year Instructional Coach for High-Dosage Tutoring Programs in Edgewater Park, NJ

Remote

Experienced School Year Tutor for K-8 Students in Math and Literacy - Mickleton, NJ

Remote

Experienced Secondary Social Studies Teacher for Kansas - Flexible Hybrid Remote Arrangement

Remote

USPS Office Helper

Remote

[Remote-Position] Pharmacy Technician - Medicare Stars Program

Remote

[PART_TIME Remote] Triage Nurse - 100% Remote - RN and LVN

Remote

Production Team Leader (3rd Shift) - Portage, Michigan

Remote

Experienced Part-Time Remote Customer Service Representative – CVS Health Customer Service Center

Remote

Experienced Part-Time Data Entry Associate – Remote Opportunity with careerzynith

Remote

[Remote] Customer Account Manager (FluentStream)

Remote

Senior Frontend Developer (React.js)

Remote

Experienced Data Scientist and Physical Security Engineer for Amazon Data Entry Jobs - Utilizing Econometrics and Science-Driven Decision Making for Business Growth

Remote

Experienced Remote Data Entry Specialist – Work from Home Opportunity with arenaflex for Career Growth and Development

Remote

Apple Home Advisor Jobs Remote $25/Hour - VacancyGlobal

Remote
← Back