[Remote] Senior Cloud Operations Engineer

Remote Full-time
Note: The job is a remote job and is open to candidates in USA. The Linux Foundation is a driving force in fostering open source collaboration and supporting communities across a range of projects, including PyTorch. They are seeking a Senior Cloud Operations Engineer who will focus on the infrastructure operations of the PyTorch project, automating processes, optimizing cloud-native tools, and ensuring a robust and scalable cloud environment.ResponsibilitiesManage multi-cloud environments, primarily focusing on AWS services (EKS, EC2, S3, IAM, ELB)Contribute to architectural exercises with open source community and technical leads to validate new cloud infrastructureImplement and maintain infrastructure-as-code using Terraform via pytorch/ci-infra and pytorch/test-infraOptimize cloud resource utilization and implement FinOps practices for cost management and reportingDesign, implement, and maintain CI/CD pipelines using GitHub Actions and ARC, including runner configurations and other elements of the CI ecosystemDebug and triage issues in build and test pipelines, including experience with unit testingDevelop monitoring and alerting solutions for CI/CD workflows and critical infrastructureManage and optimize Cloudflare CDN deployments for PyTorch assets (R2/S3)Implement best practices for CDN and overall infrastructure securityDevelop comprehensive monitoring and observability solutions using Datadog, AWS CloudWatch, and other telemetry data collection and processing toolsReview and recommend monitoring solutions as project and community needs evolveParticipate in on-call rotations supporting operations and incident response using incident.ioEstablish and maintain escalation procedures and resolution processesParticipate in ci-infra and multi-cloud working groups and support architecture decisionsCollaborate with external contributors and promote DevOps best practicesManage GitHub repositories, including user onboarding and access controlAttend and contribute to technical meetings, including Infrastructure, CI Workflow, and Technical Advisory Council sessionsDevelop and maintain technical documentation for infrastructure and processesProvide guidance on developer best practices and toolingCreate and update runbooks for common operational tasks and incident responseSkillsAbility to work with communities made up of industry specialists and collaborate outside of the Linux FoundationBachelor's degree in Computer Science, Engineering, or related field7+ years of experience in cloud operations with significant AWS expertiseStrong knowledge of infrastructure-as-code principles and tools, particularly TerraformProficiency in scripting languages (Python, TypeScript, Bash) and containerization technologies (Docker, Kubernetes)Experience with Cloudflare CDN management and optimizationExpertise in implementing and managing monitoring solutions, specifically Datadog and AWS CloudWatchFamiliarity with incident management tools and processes, particularly incident.ioDemonstrated experience in CI/CD pipeline design and implementationStrong problem-solving skills and ability to troubleshoot complex systemsExcellent communication skills and experience collaborating with open source communitiesExperience with PyTorch or other open source communitiesMulti-cloud expertise across AWS, GCP, and AzureGitHub ARC experienceKnowledge of FinOps principles and cloud cost optimization strategiesContributions to open source projects, especially in infrastructure management rolesFamiliarity with the Linux Foundation or similar open source foundationsExperience mentoring other engineers and fostering a collaborative team environmentBenefitsThe Linux Foundation maintains a predominantly remote workforceCommitted to hiring top-notch talentProviding a flexible and supportive work cultureCollaboration is embedded in our DNAWork closely together while not being confined to a traditional office spaceCompany OverviewThe Linux Foundation is the organization of choice for the world's top developers and companies to build ecosystems that accelerate open technology development and commercial adoption. It was founded in 2000, and is headquartered in San Francisco, California, USA, with a workforce of 201-500 employees. Its website is http://www.linuxfoundation.org.

Apply Now →

Similar Jobs

Experienced Registered Behavior Technician for In-Home ABA Therapy - Atlanta, GA

Remote

Immediate Hiring: Experienced Registered Behavioral Technician (RBT) for Clinic-Based ABA Therapy Services

Remote

Experienced Registered Behavioral Technician (RBT) - ABA Therapy for Children with Autism Spectrum Disorder

Remote

Experienced Registered Nurse - Telehealth: Providing Remote Care Coordination and Patient Support

Remote

Experienced Substitute Teacher for Riverside County Schools - Join Scoot Education's Innovative Team

Remote

Experienced Substitute Teacher for San Bernardino County - Flexible Schedules & Competitive Pay

Remote

Experienced School Year Instructional Coach for High-Dosage Tutoring Programs in Edgewater Park, NJ

Remote

Experienced School Year Tutor for K-8 Students in Math and Literacy - Mickleton, NJ

Remote

Experienced Secondary Social Studies Teacher for Kansas - Flexible Hybrid Remote Arrangement

Remote

USPS Office Helper

Remote

VP, Compliance Officer

Remote

REMOTE Math/English Tutors

Remote

Transportation Representative, AV

Remote

**Experienced Customer Service Representative – Night Shift Work From Home Opportunity**

Remote

Urgently Hiring: Virtual Entry Level Sales Rep (Must Live in NY)

Remote

Experienced Remote Customer Service Representative – Delivering Magical Experiences for blithequark Enthusiasts from the Comfort of Your Own Home

Remote

Senior Manager, Risk Advisor, Technology and Data Risk Management

Remote

US Carrier Sales Account Executive

Remote

Job Title: Part-Time Remote Data Entry Specialist – Flexible Hours, Professional Development & Career Growth Opportunities

Remote

**Experienced Inbound Customer Service Representative – Healthcare Equipment and Services (Work from Home)**

Remote
← Back