[Remote] Lead Site Reliability Engineer

Remote Full-time
Note: The job is a remote job and is open to candidates in USA. Gradle Technologies is an AI-native company focused on transforming software development through their Develocity platform. They are seeking a Lead Site Reliability Engineer to define SRE vision, set operational standards, and ensure reliability across production services while mentoring a growing team.ResponsibilitiesOperate and maintain all Develocity instances and supporting services in productionDefine and evolve SRE standards, practices, and operating models, including on-call, incident response, postmortems, and SLOsParticipate in a follow-the-sun on-call rotation, acting as a technical escalation point for complex or high-severity incidentsLead incident response and blameless retrospectives, ensuring learnings result in measurable reliability improvementsSet reliability priorities using risk, customer impact, business goals, SLOs, and error budgetsIdentify systemic reliability risks and continuously evolve Develocity’s SaaS operations as the platform and customer base growLead and influence architectural and design reviews to ensure reliability, scalability, and operabilityDrive automation across deployment, upgrades, monitoring, self-healing, recovery, and operational workflowsBuild and maintain comprehensive observability for all managed services, including logging, metrics, tracing, and alertingOwn disaster recovery, backups, and business continuity planning and executionPartner with engineering leadership to balance feature delivery with reliability and operational excellenceMentor and coach SREs, supporting technical growth and strong operational practicesHelp onboard new SREs and contribute to hiring by defining and assessing SRE excellence at DevelocityCommunicate clearly with customers during incidents and maintenance windowsOptimize performance, resource utilization, and operational costsSkills7+ years in SRE, DevOps, or an equivalent role operating production services at scaleExperience leading reliability initiatives across multiple teams or servicesDemonstrated ability to influence technical direction without direct authorityExperience designing and operating systems with SLOs and error budgets, and exercising strong judgment in balancing reliability, velocity, and costStrong Kubernetes experience in production environmentsCloud infrastructure expertise, preferably AWS (EKS, RDS, S3, EC2)Proficiency with observability tools (Prometheus, Grafana) and Infrastructure as Code (Terraform)Track record of incident management and response in a 24/7 on-call environmentScripting proficiency (Python, Bash) for automationStrong written and verbal English communication skillsExperience as a founding or early SRE establishing practices in a growing SaaS organizationFamiliarity with DevelocityJVM language experience (Java, Kotlin)Experience with customer-facing and executive-level incident communicationsBenefitsA ground-floor role in a new SRE team - you'll shape how we do things, not inherit someone else's decisions.Real ownership of production systems used by engineers at companies you've heard of.Direct interaction with customers when things go wrong (and when they go right).A culture that values automation over heroics.In-person meetings, such as our annual company offsite and team meetings.Work from home in a remote-first environment.Competitive salaries and equity grants.Company OverviewGradle Technologies is the award-winning developer productivity company behind Gradle Build Tool—one of the most used build systems in the world—and Develocity®, the leading developer observability platform. It was founded in 2014, and is headquartered in San Francisco, California, USA, with a workforce of 51-200 employees. Its website is https://gradle.com/.Company H1B SponsorshipGradle Technologies has a track record of offering H1B sponsorships, with 1 in 2025, 1 in 2024, 2 in 2022. Please note that this does not guarantee sponsorship for this specific role.

Apply Now →

Similar Jobs

Experienced Registered Behavior Technician for In-Home ABA Therapy - Atlanta, GA

Remote

Immediate Hiring: Experienced Registered Behavioral Technician (RBT) for Clinic-Based ABA Therapy Services

Remote

Experienced Registered Behavioral Technician (RBT) - ABA Therapy for Children with Autism Spectrum Disorder

Remote

Experienced Registered Nurse - Telehealth: Providing Remote Care Coordination and Patient Support

Remote

Experienced Substitute Teacher for Riverside County Schools - Join Scoot Education's Innovative Team

Remote

Experienced Substitute Teacher for San Bernardino County - Flexible Schedules & Competitive Pay

Remote

Experienced School Year Instructional Coach for High-Dosage Tutoring Programs in Edgewater Park, NJ

Remote

Experienced School Year Tutor for K-8 Students in Math and Literacy - Mickleton, NJ

Remote

Experienced Secondary Social Studies Teacher for Kansas - Flexible Hybrid Remote Arrangement

Remote

USPS Office Helper

Remote

**Experienced Customer Service/Data Entry Representative – Remote Opportunity with arenaflex**

Remote

Postal Support Worker

Remote

[Full Remote] Junior Marketing

Remote

[Remote] Head of Security Engineering

Remote

Senior Data Scientist – Advanced Analytics & Machine Learning for Digital Advertising Platform

Remote

**Part Time/Casual Customer Service & Sales Assistant (Various Positions Available) – arenaflex Retail Services**

Remote

**Experienced Online Data Entry Specialist – Remote Opportunity with arenaflex**

Remote

**Experienced Full Stack Customer Service Representative – Remote $30-$41/Hour**

Remote

**Experienced Customer Service Representative - Outdoor Airport Operations**

Remote

[Remote] Russian Language Specialist

Remote
← Back