Site Reliability Engineer
Thales is a leader in digital security, providing identity management and data protection solutions. They are seeking a Site Reliability Engineer to ensure high service levels for their Telecommunication solution deployed in the public cloud, focusing on automation, reliability engineering, and incident management.ResponsibilitiesDesign, build, and maintain scalable infrastructure using tools such as Terraform, Ansible, and KubernetesDevelop automated CI/CD pipelines via GitLab to reduce manual toilDefine and monitor Service Level Objectives (SLOs) and Service Level Indicators (SLIs)Manage 'Error Budgets' to balance the velocity of new features with the stability of the platformParticipate in 24/7 on-call rotations to provide emergency response and perform deep-dive troubleshooting for production issuesConduct system performance analysis, identify bottlenecks, and perform capacity planning to ensure the infrastructure can handle growth and peak loadsImplement and refine symptom-based alerting and comprehensive monitoring strategies using platforms like Datadog to ensure high visibility into system healthLead blameless postmortems after incidents to identify root causes and implement long-term technical fixes to prevent recurrencePartner with Cloud Security teams to implement security best practices, manage access controls, and respond to security breaches or vulnerabilitiesInterface with other stakeholders to define solution improvement planYou will have the ownership of solution service availabilitySkillsEngineer or equivalentAt least 1 year experienceJava development skill is requiredYou are familiar with Public Cloud (GCP, AWS), containers and microservices (Docker, Kubernetes, Java), CI/CD and automation (Jenkins, Gitlab, Helm), NoSQL databaseMust have U.S. or Dual Citizenship and be able to obtain post-hire clearance from the Committee on Foreign Investments in the U.S. (CFIUS) and Department of TreasuryGCP cloud architect certification is a plusYou have already set up product monitoring and the underlying infrastructureYou have development experience in a distributed systems and/or high availability contextYou are familiar with microservices developmentYou participated in the definition of architectures, data structures, algorithms with performance, security, reliability constraints, etcPublic cloud architect certificationYou are interested in aspects of Site Reliability Engineer: CI/CD, automation, monitoring and observability, and continuous improvementYou are an accomplished, versatile and multi-tasking developer engineerBenefitsElective Health, Dental, Vision, FSA/HSA, Voluntary Life and AD&D, Whole Group Life w/LTC, Critical Illness, Hospital Indemnity, Accident Insurance, Legal Plan, Identity Theft, and Pet InsuranceRetirement Savings Plan after 30 days of employment with a company contribution and a match, and with no vesting periodCompany paid holidays and Paid Time OffCompany provided Life Insurance, AD&D, Disability, Employee Assistance Plan, and Well-being ProgramCompany OverviewThales (Euronext Paris: HO) is a global leader in advanced technologies for the Defence, Aerospace, and Cyber & Digital sectors. It was founded in 1893, and is headquartered in Paris, Ile-de-France, FRA, with a workforce of 10001+ employees. Its website is http://www.thalesgroup.com.
Apply To This Job
Apply To This Job