Site reliability engineer
About this role We are looking for a foundational member of the Cloud infrastructure team at Writer. This role will involve contributing to the development and implementation of our Site reliability engineering (SRE) program. The ideal candidate will ensure the reliability, scalability, performance, and security of Writer’s critical systems, taking a proactive approach to guarantee that our high-ROI products reach our customers seamlessly. ♀️ Your responsibilities:Lead the design, implementation, and maintenance of Writer, Inc.’s cloud infrastructure to ensure high availability and performanceDesign and implement scalable cloud automation to support seamless deployment for our largest enterprise customersAutomate infrastructure provisioning and management using Terraform & PythonCollaborate with development teams to optimize cloud resources and enhance system reliabilityDevelop and maintain monitoring and alerting systems to proactively identify and resolve issues affecting the reliability of our writing solutionsConduct post-mortem analyses of system failures to identify root causes and implement preventive measuresOptimize and scale our cloud infrastructure to support growing user demand and ensure cost efficiencyEnsure the security and compliance of our systems, adhering to industry standards and regulationsProvide mentorship and technical guidance to junior engineers, fostering a culture of reliability and continuous improvementStay current with emerging technologies and industry trends to continuously improve our site reliability practices⭐ Is this you? Proven expertise in Site Reliability Engineering with a minimum of 7 years of hands-on experienceDeep understanding of system architecture and infrastructure design to ensure high availability and performanceBachelor’s degree in Computer Science, Engineering, or a related technical fieldStrong proficiency in programming languages such as Python, Java, Go for automation and monitoringExperience with cloud platforms like AWS, Azure, or GCP, and their respective services for scalable and resilient systemsExpertise in containerization technologies (e.g., Docker, Kubernetes) and orchestration toolsKnowledge of monitoring and logging tools (e.g., Prometheus, Grafana, ELK Stack) to maintain system health and performanceAbility to lead and mentor junior engineers in best practices for reliability and system optimizationExcellent communication skills to collaborate effectively with cross-functional teams and stakeholdersProactive approach to identifying and mitigating potential system failures and performance bottlenecksPreferred skills & experience:Software engineering expertiseTerraformPythonKubernetesScalaAWS/GCP Benefits & perks (UK full-time employees):Generous PTO, plus company holidaysComprehensive medical and dental insurancePaid parental leave for all parents (12 weeks)Fertility and family planning supportEarly-detection cancer testing through GalleriCompetitive pension scheme and company contributionAnnual work-life stipends for:Home office setup, cell phone, internetWellness stipend for gym, massage/chiropractor, personal training, etc.Learning and development stipendCompany-wide off-sites and team off-sitesCompetitive compensation and company stock options
Apply Now
Apply Now