[Remote] Site Reliability Engineer
Note: The job is a remote job and is open to candidates in USA. Vynca is dedicated to transforming care for individuals with complex needs. They are seeking a Site Reliability Engineer to build and operate the infrastructure for their healthcare technology platform, focusing on reliability, scalability, and security of their systems.ResponsibilitiesDesign, provision, and manage AWS infrastructure using Terraform as the source of truthOperate, maintain, and scale production workloads running on KubernetesPackage, deploy, and manage applications using Helm and infrastructure automation toolsBuild, operate, and improve distributed and event-driven systems, including event sourcing, partitioning, event ordering, replay, and failure recovery mechanismsDefine, monitor, and maintain Service Level Indicators (SLIs), Service Level Objectives (SLOs), and error budgets to balance reliability and engineering velocityDevelop automation for deployment, scaling, monitoring, incident response, and operational workflows to reduce manual effort and improve system resilienceOwn platform observability by implementing and maintaining metrics, logging, tracing, monitoring, and alerting solutionsLead incident response efforts, facilitate blameless postmortems, and drive long-term corrective actions that improve system reliabilityPartner with Product and Engineering teams on capacity planning, performance optimization, and resilient system designImplement and maintain security best practices to support HIPAA, SOC 2, and other compliance requirementsParticipate in an on-call rotation and provide operational support for production systemsSkillsThree to five (3–5) years of experience in Site Reliability Engineering, DevOps Engineering, Platform Engineering, Cloud Infrastructure Engineering, or similar infrastructure-focused roles, preferably within healthcare, SaaS, or high-growth technology environmentsBachelor's degree in Computer Science, Information Systems, Software Engineering, or a related technical field; equivalent professional experience will also be consideredStrong hands-on experience operating production workloads within AWS environmentsProven experience managing infrastructure as code using Terraform, including module development, state management, and deployment automationExperience operating and supporting production Kubernetes environmentsHands-on experience deploying and managing applications using HelmExperience working with distributed systems, event-driven architectures, or event-sourcing platforms, including concepts such as partitioning, event ordering, replay, and fault toleranceExperience establishing and managing observability practices including monitoring, logging, tracing, alerting, and incident responseStrong understanding of Linux systems administration, networking, cloud architecture, and distributed systems fundamentalsExperience designing, implementing, and maintaining CI/CD pipelines and deployment automationStrong problem-solving skills with the ability to troubleshoot complex infrastructure and application issuesExcellent written and verbal communication skills with the ability to collaborate effectively across technical and non-technical teamsHigh level of ownership, accountability, and initiative with a proactive approach to reliability and operational excellenceAbility and willingness to participate in an on-call rotation supporting production systemsStrong programming or scripting experience with Python, Go, or similar languagesExperience with observability platforms such as Prometheus, Grafana, Datadog, CloudWatch, SigNoz, or OpenTelemetryExperience with GitOps tools such as ArgoCD or FluxExperience managing databases such as PostgreSQL, MySQL, Redshift, or ClickHouseExperience implementing secrets management solutions such as AWS Secrets Manager or HashiCorp VaultExperience supporting healthcare technology platforms or other highly regulated environmentsFamiliarity with data infrastructure technologies including Snowflake, Redshift, and ETL/ELT pipelinesExperience with database performance tuning and optimizationCompany OverviewVynca is a health technology and services company transforming care for people living with serious illness and complex needs. It was founded in 2013, and is headquartered in Palo Alto, California, USA, with a workforce of 51-200 employees. Its website is https://www.vyncahealth.com.