Senior Site Reliability Engineer, Core Cloud Engineering

Remote Full-time
Join Vultr

Our Engineering team at Vultr is seeking a Senior Site Reliability Engineer, Core Cloud Engineering to report to the Director of Core Cloud Engineering. This role demands deep expertise in large-scale distributed systems, infrastructure automation, and production operations of hypervisor platforms and the control plane. The ideal candidate will combine hands-on systems engineering with a focus on reliability, scalability, and observability, ensuring Vultr's cloud services remain performant and resilient for our 1.5 million users.

Key Responsibilities
• Production Control Plane Operations: Operate and scale Vultr's control plane, ensuring availability, correctness, and performance across global datacenters.
• Hypervisor & Infrastructure Reliability: Design, implement, and maintain automation to manage hypervisor fleets (KVM, QEMU, libvirt) and supporting infrastructure at scale.
• Networking & Systems Automation: Develop tooling and automation for Open vSwitch (OVS), BGP routing, and other networking components to ensure resilient and self-healing network operations.
• Performance & Reliability Tuning: Continuously analyze and improve system performance across compute, storage, and network layers, with an emphasis on reducing toil and eliminating single points of failure.
• Observability & Incident Response: Implement advanced monitoring, logging, and tracing solutions (Grafana, Sentry, SumoLogic) while leading incident response to minimize impact and drive postmortem culture.
• CI/CD & Configuration Management: Maintain and evolve infrastructure pipelines (GitLab CI/CD, Puppet) to enable safe, fast, and reliable changes to both control plane and hypervisor infrastructure.
• Collaboration: Work closely with Software Engineers, Network Engineers, and Product teams to align platform reliability with business and user needs.
• Documentation & Standards: Produce clear technical documentation for runbooks, operational procedures, and automation frameworks to improve team efficiency and reliability standards.
• Mentorship & Leadership: Coach and mentor team members in best practices for site reliability, incident handling, automation, and low-level Linux systems debugging.

Qualifications
• Proficiency in PHP with strong scripting and automation skills.
• Experience running large-scale distributed systems and control plane infrastructure in production.
• Strong background in hypervisor technologies (libvirt, QEMU, KVM) and Linux systems administration.
• Expertise in networking protocols and tools, particularly BGP and Open vSwitch (OVS), with automation experience.
• Deep knowledge of observability and monitoring frameworks (Grafana, Sentry, SumoLogic) and incident management.
• Advanced troubleshooting skills across compute, networking, and storage subsystems.
• Experience building and maintaining CI/CD pipelines (GitLab) and configuration management (Puppet).
• Familiarity with MySQL or similar databases, with an understanding of operational considerations for reliability and scale.
• Strong problem-solving abilities and the drive to tackle complex, low-level reliability challenges.
• Effective cross-team communication and collaboration skills.
• A commitment to continuous improvement and fostering a culture of operational excellence.

Compensation

$120,000 - $130,000

Final compensation will vary depending on years of experience, background/skill set, location, and applicable laws.

Apply tot his job

Apply To this Job
Apply Now →

Similar Jobs

Experienced Registered Behavior Technician for In-Home ABA Therapy - Atlanta, GA

Remote

Immediate Hiring: Experienced Registered Behavioral Technician (RBT) for Clinic-Based ABA Therapy Services

Remote

Experienced Registered Behavioral Technician (RBT) - ABA Therapy for Children with Autism Spectrum Disorder

Remote

Experienced Registered Nurse - Telehealth: Providing Remote Care Coordination and Patient Support

Remote

Experienced Substitute Teacher for Riverside County Schools - Join Scoot Education's Innovative Team

Remote

Experienced Substitute Teacher for San Bernardino County - Flexible Schedules & Competitive Pay

Remote

Experienced School Year Instructional Coach for High-Dosage Tutoring Programs in Edgewater Park, NJ

Remote

Experienced School Year Tutor for K-8 Students in Math and Literacy - Mickleton, NJ

Remote

Experienced Secondary Social Studies Teacher for Kansas - Flexible Hybrid Remote Arrangement

Remote

USPS Office Helper

Remote

[Remote] Clinical Biomarkers & Diagnostics Manager

Remote

Urgently Hiring: Amazon eCommerce Associate

Remote

[Remote] Content Strategist, Enrollment Marketing

Remote

(REMOTE) Virtual Travel Assistant

Remote

Regional Sales Director

Remote

[Remote] Applied Machine Learning Engineer, Circuit Design - New College Grad 2026, Applied Machine Learning Engineer, Circuit Design - New College Grad 2026

Remote

Customer Service Agent, OPT

Remote

COORDINADOR ESTRATEGIA PRECIOS Y PROMOCIONES

Remote

Stock & Options Trader – Remote

Remote

Yelp Jobs, Employment in Remote (Austin Tx)

Remote
← Back