DevOps Engineer - Kubernetes (K8s) Expert

Remote Full-time
DevOps Engineer – AI Infrastructure & Scaling Specialist

We are looking for an experienced DevOps engineer who specializes in optimizing, maintaining, and scaling infrastructure for AI-powered web applications. Our CI/CD pipelines and Kubernetes clusters are already built and running β€” we need someone who can take ownership of keeping them healthy, improving performance, and scaling them efficiently as our AI products grow and we scale our user base.

About the Role

You will take over the day-to-day management and optimization of our existing infrastructure for AI products. This includes fine-tuning our CI/CD workflows, maintaining and scaling our Kubernetes clusters, and ensuring our systems remain performant and cost-efficient under increasing load. The ideal candidate has hands-on experience optimizing infrastructure that supports large language models and high-throughput inference APIs.

Key Responsibilities

- Maintain, monitor, and optimize existing Kubernetes clusters (EKS/GKE/AKS) running AI workloads, ensuring high availability and efficient resource utilization

- Optimize and improve existing CI/CD pipelines for faster, more reliable deployments of AI services and model updates

- Implement and refine auto-scaling policies to handle unpredictable traffic spikes typical of AI-powered products

- Optimize instance utilization by right-sizing instances, leveraging spot/preemptible instances, and managing resource scheduling to reduce costs

- Maintain and improve monitoring, alerting, and logging (Prometheus) with a focus on latency, throughput, and model performance metrics

- Implement and optimize caching layers (Redis, CDN) and load balancing strategies to reduce inference latency

- Manage and evolve infrastructure as code to keep configurations clean, versioned, and reproducible

- Troubleshoot production incidents, perform root cause analysis, and implement preventive measures

- Ensure ongoing security best practices including secrets management, network policies, IAM roles, and compliance standards

Required Skills & Experience

- 8+ years of DevOps / SRE / Infrastructure engineering experience

- Strong experience with AWS

- Proven track record of optimizing and scaling AI applications or high-traffic, compute-intensive services

- Deep knowledge of Kubernetes operations, troubleshooting, and performance tuning at scale

- Experience optimizing CI/CD pipelines

- Solid understanding of networking, load balancing, DNS, and CDN configuration

- Strong scripting skills

Nice to Have

- Experience with serverless AI inference (e.g., AWS Lambda, Modal, Replicate, or RunPod)

- Knowledge of vector databases and RAG pipeline infrastructure (Pinecone)

- Familiarity with LLM-specific infrastructure (token-based rate limiting, streaming responses, prompt caching)

- Experience with multi-region or edge deployments for low-latency AI serving

- Cost optimization experience managing $10K+/month cloud budgets

- SOC 2 or HIPAA compliance experience

What We Offer

- Flexible, fully remote work arrangement

- Opportunity to work on cutting-edge AI products at scale

- Long-term engagement with potential for ongoing collaboration

- Competitive hourly rate commensurate with experience

To Apply

Please include in your proposal:

1. A brief overview of your dev ops experience

2. Specific examples of optimizations you've implemented (performance gains, cost savings, uptime improvements)

3. Your preferred cloud platform and tooling stack

4. Your availability and hourly rate

Apply tot his job

Apply To this Job
Apply Now β†’

Similar Jobs

Experienced Registered Behavior Technician for In-Home ABA Therapy - Atlanta, GA

Remote

Immediate Hiring: Experienced Registered Behavioral Technician (RBT) for Clinic-Based ABA Therapy Services

Remote

Experienced Registered Behavioral Technician (RBT) - ABA Therapy for Children with Autism Spectrum Disorder

Remote

Experienced Registered Nurse - Telehealth: Providing Remote Care Coordination and Patient Support

Remote

Experienced Substitute Teacher for Riverside County Schools - Join Scoot Education's Innovative Team

Remote

Experienced Substitute Teacher for San Bernardino County - Flexible Schedules & Competitive Pay

Remote

Experienced School Year Instructional Coach for High-Dosage Tutoring Programs in Edgewater Park, NJ

Remote

Experienced School Year Tutor for K-8 Students in Math and Literacy - Mickleton, NJ

Remote

Experienced Secondary Social Studies Teacher for Kansas - Flexible Hybrid Remote Arrangement

Remote

USPS Office Helper

Remote

Business Analyst with Strong Exp in Private Equity and Sage Intacct - Remote

Remote

Urgently Hiring: Remote Freelance Writers for Entry-Level

Remote

Senior System Support Specialist – Linux/PACS

Remote

Compliance Officer - (Open to Malaysian candidates)

Remote

**Experienced Online Airport Customer Service Representative – Delivering Exceptional Travel Experiences for American Airlines**

Remote

Pharmacovigilance Project Manager

Remote

Experienced Social Media Customer Support Specialist – Remote Job Opportunity with arenaflex

Remote

Engineering Manager Sr - DF072IO

Remote

Experienced Live Chat Support Representative – Customer Service and Entertainment Expert

Remote

Candidatures spontanΓ©es

Remote
← Back