[Remote] Site Reliability Engineer | $70/hr Remote
Note: The job is a remote job and is open to candidates in USA. Crossing Hurdles is seeking a Site Reliability Engineer to enhance their AI training environments. The role focuses on deploying and managing containerized systems while ensuring performance optimization and system stability.ResponsibilitiesDeploy, monitor, and recover containerized AI training environmentsTroubleshoot infrastructure bottlenecks and resolve system failures in real timeBuild and manage resilient systems for stability and performance optimizationCollaborate with engineering teams to improve CI/CD pipelines and automationManage filesystem structures, storage, and process scheduling in containerized environmentsExecute dynamic replanning during runtime issues and system failuresDocument system processes, solutions, and best practicesSkillsStrong experience with terminal-based system administration and troubleshootingExpertise in containerized environments such as Docker or KubernetesStrong Python skills for scripting, automation, and debuggingProficiency in Bash and familiarity with additional programming languagesStrong understanding of infrastructure, build systems, and version controlAbility to manage dynamic infrastructure recovery in high-pressure scenariosExcellent written and verbal communication skillsCompany OverviewCrossing Hurdles connects skilled professionals with opportunities across AI training platforms and high-growth companies. It was founded in 2019, and is headquartered in Gurgaon, Haryana, IND, with a workforce of 51-200 employees. Its website is https://www.crossinghurdles.com.