Site Reliability/Platform Engineer (Linux/ Kubernetes / Python) - 180-190K

Remote Full-time
Site Reliability/Platform Engineer (Linux/ Kubernetes / Python)
Location: Reston, VA (onsite 3 days a week, but only 9 days a month)
Salary: 180-90K + 10% Bonus

Must have the following: Kubernetes, Red Hat Linux, Python & Bash scripting, Platform Engineering, On-prem infrastructure, Observability, Incident response, either Bare-Metal or VM is fine...

OpenShift experience will be considered a big advantage.

Responsibilities:
• Own the stability, performance, and reliability of core platform infrastructure across on-premise environments and future Azure cloud.
• Manage and optimize Kubernetes/OpenShift clusters, ensuring high availability and scalability.
• Lead incident response, root-cause analysis, and long-term remediation efforts in a high-stakes production environment.
• Drive continuous improvement of platform performance, reliability, and automation.
• Build and enhance observability frameworks using tools like Prometheus, Grafana, and Datadog.
• Develop and maintain automation scripts and tooling using Python and Bash to reduce manual intervention.
• Partner with engineering and development teams to troubleshoot deployment, configuration, and infrastructure issues.
• Support and enhance CI/CD pipelines and platform delivery processes.
• Administer and optimize Linux-based systems, primarily within Red Hat environments.
• Maintain documentation, runbooks, and operational procedures.
• Participate in on-call rotation supporting critical systems.

Requirements:
• Bachelor's degree in computer science or related field, or equivalent experience.
• 5+ years of experience in Site Reliability Engineering or Platform Engineering roles.
• Strong hands-on experience with Kubernetes in production environments (OpenShift preferred but not required).
• Solid experience with Red Hat Enterprise Linux system administration.
• Strong scripting experience with Python and Bash.
• Experience managing on-premises infrastructure environments.
• Experience with observability tools such as Prometheus, Grafana, or Datadog.
• Strong troubleshooting experience across distributed systems, logs, metrics, and traces.
• Experience working in high-performance, high-availability environments.
• Exposure to Azure cloud services is a plus.
• Strong communication and documentation skills.

Site Reliability Engineer, SRE, OpenShift engineer, Kubernetes engineer, Azure cloud engineer, platform engineer, DevOps engineer, observability, Grafana, Prometheus, Datadog, HashiCorp Vault, Kafka, AMQ, Redis, CI/CD, automation, Bash scripting, Python scripting, cloud infrastructure, hybrid cloud, data center, reliability engineering, incident response, root cause analysis, container platform, cluster management, Azure infrastructure, production support, platform reliability, DevOps, monitoring tools, automation engineer, enterprise infrastructure, platform services, site reliability, cloud platform, OpenShift administrator, Kubernetes troubleshooting

Apply tot his job

Apply To this Job
Apply Now →

Similar Jobs

Experienced Registered Behavior Technician for In-Home ABA Therapy - Atlanta, GA

Remote

Immediate Hiring: Experienced Registered Behavioral Technician (RBT) for Clinic-Based ABA Therapy Services

Remote

Experienced Registered Behavioral Technician (RBT) - ABA Therapy for Children with Autism Spectrum Disorder

Remote

Experienced Registered Nurse - Telehealth: Providing Remote Care Coordination and Patient Support

Remote

Experienced Substitute Teacher for Riverside County Schools - Join Scoot Education's Innovative Team

Remote

Experienced Substitute Teacher for San Bernardino County - Flexible Schedules & Competitive Pay

Remote

Experienced School Year Instructional Coach for High-Dosage Tutoring Programs in Edgewater Park, NJ

Remote

Experienced School Year Tutor for K-8 Students in Math and Literacy - Mickleton, NJ

Remote

Experienced Secondary Social Studies Teacher for Kansas - Flexible Hybrid Remote Arrangement

Remote

USPS Office Helper

Remote

[Remote] Data Analytics & Engineering - Data Analyst IV-Remote

Remote

Anthem Memory Care – Dining Services Associate – South Jordan, UT

Remote

AI/ML Specialist Solutions Architect

Remote

Netflix Tagger - Remote Job

Remote

Sr HR Business Partner, Corporate

Remote

(Customer Service) Airlines Southwest Airlines Remote Jobs Part Time – bolthires Store

Remote

INTERVIEWING NOW FOR CALIFORNIA!! - LICENSED CLINICAL THERAPIST - LCSW/LPCC/LMFT – Adult Evening IOP Virtual

Remote

Wix Velo Developer Needed to Build Staffing Portal Connected to Airtable (Web App, Not Website) - Contract to Hire

Remote

Costco Logistics Member Experience Clerk – Amazon Store

Remote

Senior Lifecycle Marketing Manager

Remote
← Back