[Remote] Cloud Operations Engineer
Note: The job is a remote job and is open to candidates in USA. O'Reilly Media is dedicated to sharing the knowledge of innovators and helping professionals develop expertise. As a Cloud Operations Engineer, you will work on systems and tooling that power the learning platform, focusing on infrastructure-as-code and maintaining Kubernetes while collaborating with product engineering teams.ResponsibilitiesMaintaining and updating our Kubernetes cluster to ensure steady-state operationsWriting or extending Terraform modules to provision and manage cloud infrastructureContributing features to the Python CLI tooling we use to manage infrastructure workflowsDesign, build, and maintain cloud infrastructure using infrastructure-as-code (Terraform) on GCPManage and evolve our Kubernetes platform, including cluster operations, workload configuration, and service mesh (Istio)Develop and improve internal tooling that abstracts cloud complexity and improves the developer experienceCollaborate with product engineering teams to understand service deployment needs and deliver infrastructure solutionsMonitor platform health using Datadog; proactively identify and resolve performance, availability, and security issuesParticipate in on-call rotation and incident response; drive blameless post-mortems and eliminate recurring issues at their root causeDefine and track service-level indicators and objectives (SLIs/SLOs) for critical platform componentsImplement and refine alerting, dashboards, and runbooks that reduce mean time to resolutionEmbed security best practices into infrastructure workflows (DevSecOps) — not as an afterthought, but as a design principleHelp maintain cloud security posture, IAM hygiene, and policy guardrails across our cloud environmentStay current with cloud security developments and proactively surface risks to the teamExecute and maintain our automated disaster recovery processesWork closely with product engineering teams to understand their needs and remove infrastructure frictionDocument systems, processes, and architectural decisions clearly so knowledge is shared, not siloedRecommend improvements to tooling, architecture, and processes — and help drive them to completionKeep current with the evolving cloud-native ecosystem and bring relevant knowledge back to the teamSkillsBachelor's degree in Computer Science or a related field5+ years of experience working in cloud infrastructure, platform engineering, or a related disciplineIn lieu of degree, equivalent education and/or experience may be consideredHands-on experience with Kubernetes in production environments (cluster management, workloads, networking)Proficiency with infrastructure-as-code tools, particularly TerraformExperience with at least one major cloud provider (GCP, AWS, or Azure)Solid scripting and automation skills in Python, Bash, or a comparable languageExperience with modern observability platforms (Datadog, Grafana, or similar)Strong understanding of Linux systems administrationWorking knowledge of CI/CD concepts and tools (GitHub Actions, ArgoCD, Jenkins, or similar)Excellent communication skills — you write clearly, ask good questions, and explain complex systems accessiblyAI-Augmented Development: Has the ability to demonstrate using AI-enabled development tools (e.g., Claude Code, Cursor) to streamline coding, debugging, and infrastructure-as-code authoringExperience with service mesh technologies such as Istio or LinkerdFamiliarity with GitOps workflows and tools (ArgoCD, Flux)Experience with DevSecOps practices and tooling (Snyk, Trivy, OPA, or similar)Working knowledge of SQL databases (PostgreSQL or MySQL)Familiarity with FinOps practices and cloud cost optimizationExperience building or consuming internal developer platforms (IDPs)Configuration management experience (Ansible, Chef, or similar)Relevant certifications (CKA, CKAD, AWS/GCP Professional, or similar)Company OverviewInspiring the future for more than 45 years We share the knowledge and teach the skills people need to change their world. It was founded in 1978, and is headquartered in Seattle, Washington, USA, with a workforce of 201-500 employees. Its website is http://dankaminsky.com.