SRE (Kubernetes)

Remote Full-time
The Role We are seeking a high-caliber Site Reliability Engineer (SRE) with a focus on Application and Platform stability. In this role, you will be the guardian of our global application ecosystem, ensuring 24x7 reliability and peak performance. You will bridge the gap between software engineering and systems operations, specifically within heavy Big Data and Streaming environments. Whether you prefer the collaboration of our Bloomfield office, the comfort of your home office, or a mix of both, we offer total flexibility to fit your lifestyle.
Key Responsibilities

Operational Excellence: Maintain 24x7 system reliability, incident response, and operational readiness for mission-critical global applications.

Incident Leadership: Lead troubleshooting efforts during high-pressure outages; perform deep-dive Root Cause Analysis (RCA) and automate preventive measures.

Reliability Engineering: Define and monitor SLIs/SLOs/SLAs (availability, latency, throughput, and resource utilization).

Big Data & Streaming Support: Manage and optimize distributed data frameworks, ensuring the health of Spark, Flink, and Kafka pipelines.

Infrastructure as Code: Support deployments across AWS Cloud and Kubernetes (EKS) environments.

Cluster Governance: Implement Kubernetes resource quotas, access controls (RBAC), and namespace management to ensure multi-tenant stability.
Technical Qualifications

Core SRE Skills: Proven expertise in monitoring, performance tuning, and capacity planning.

Distributed Systems: Strong hands-on experience with Spark, Flink, and Kafka . Hadoop Ecosystem: Proficiency in Hadoop Cluster Administration and Operations.

Cloud & Containers: Deep understanding of AWS and Kubernetes (K8s) orchestration.

Automation Mindset: Experience replacing manual "toil" with automated scripts and tools.
Why Join Us?

True Flexibility: We trust our engineers. Choose the work mode that makes you most productive.

Scale: Work on massive distributed systems and global-scale data processing.

Culture: A collaborative environment where Root Cause Analysis is blameless and innovation is encouraged.

Note: We are currently only accepting applications from s (USC) or (GC) holders.

For applications and inquiries, contact: [email protected]

Apply tot his job

Apply To this Job
Apply Now →

Similar Jobs

Experienced Registered Behavior Technician for In-Home ABA Therapy - Atlanta, GA

Remote

Immediate Hiring: Experienced Registered Behavioral Technician (RBT) for Clinic-Based ABA Therapy Services

Remote

Experienced Registered Behavioral Technician (RBT) - ABA Therapy for Children with Autism Spectrum Disorder

Remote

Experienced Registered Nurse - Telehealth: Providing Remote Care Coordination and Patient Support

Remote

Experienced Substitute Teacher for Riverside County Schools - Join Scoot Education's Innovative Team

Remote

Experienced Substitute Teacher for San Bernardino County - Flexible Schedules & Competitive Pay

Remote

Experienced School Year Instructional Coach for High-Dosage Tutoring Programs in Edgewater Park, NJ

Remote

Experienced School Year Tutor for K-8 Students in Math and Literacy - Mickleton, NJ

Remote

Experienced Secondary Social Studies Teacher for Kansas - Flexible Hybrid Remote Arrangement

Remote

USPS Office Helper

Remote

[Remote] Freelance Finance Experts at AI Startup

Remote

[Remote] Solution Engineer, MuleSoft - Salesforce National Security

Remote

iOS Developer for a Highly Design-Driven Reminder App

Remote

System Technical Security Analyst

Remote

Experienced Part Time Remote Data Entry Specialist – Join arenaflex for a Rewarding Career in Data Management and Enjoy the Flexibility of Working from Home

Remote

**Experienced Data Entry Specialist – Remote Part-Time Opportunity at arenaflex**

Remote

SolidWorks Designers

Remote

Field Rep - ND

Remote

Chewy At Home – Customer Service Remote Jobs ($20/Hr)

Remote

Data Analytics Project Manager(Medicaid Exp must)

Remote
← Back