[Remote] Site Reliability Engineer II, tvScientific
Note: The job is a remote job and is open to candidates in USA. Pinterest is a platform that inspires creativity and helps users plan for memorable experiences. They are seeking a Site Reliability Engineer to operate, scale, and enhance a cloud-native platform on AWS, focusing on improving infrastructure reliability and operational maturity.ResponsibilitiesEnsuring the reliability, availability, and performance of production infrastructure and platform servicesOperating and scaling Kubernetes platforms, including governance and support for multi-tenant workloadsManaging GitOps-based deployment workflows using ArgoCD and HelmSupporting infrastructure provisioning and change management through Terraform/TerragruntBuilding and supporting CI/CD automation and deployment workflows using GitHub ActionsParticipating in incident response, root cause analysis, and post-incident improvement initiativesReducing operational toil through scripting, tooling, and process automationAdvancing observability practices across logs, metrics, traces, dashboards, and alertingSupporting secure secrets integration, IAM-aware operations, and platform guardrailsPartnering closely with application, security, and platform teams to improve reliability and delivery outcomesSkills4+ years of experience in Site Reliability Engineering, DevOps, Platform Engineering, or Cloud InfrastructureStrong hands-on experience operating AWS in production environmentsGood expertise in Kubernetes, including cluster operations, troubleshooting, workload reliability, and platform administrationExperience with Kubernetes multi-tenancy, including namespaces, RBAC, quotas, policies, and tenant isolation patternsExperience implementing and operating ArgoCD within a GitOps delivery modelStrong hands-on experience with HelmExperience with Terraform/Terragrunt for infrastructure provisioning and environment managementSolid scripting and automation skills using Bash and/or PythonExperience building, maintaining, or supporting CI/CD pipelines, ideally using GitHub ActionsStrong troubleshooting skills across Linux, containers, IAM, networking, and distributed systemsExperience with monitoring, alerting, and observability in production environmentsDemonstrated ownership mindset with experience handling incidents and resolving production issuesStrong collaboration and communication skills, with the ability to work effectively across engineering, security, and platform teamsBachelor's degree in computer science, engineering, a related field or equivalent experienceDemonstrated ability to use AI to improve speed and quality in your day-to-day workflow for relevant outputsStrong track record of critical evaluation and verification of AI-assisted work (e.g., testing, source-checking, data validation, peer review)High integrity and ownership: you protect sensitive data, avoid over-reliance on AI, and remain accountable for final decisions and deliverablesBenefitsThe position is also eligible for equity.Information regarding the culture at Pinterest and benefits available for this position can be found here.Company OverviewPinterest is a visual bookmarking tool for saving and discovering creative ideas. It was founded in 2010, and is headquartered in San Francisco, California, USA, with a workforce of 1001-5000 employees. Its website is https://www.pinterest.com/.