[Remote] AI Systems Administrator
Note: The job is a remote job and is open to candidates in USA. MCI is one of the fastest-growing tech-enabled business services companies in the USA, specializing in customer experience and business process outsourcing. They are seeking a technically skilled AI Systems Administrator to support, maintain, and optimize the infrastructure for their artificial intelligence and machine learning environments, ensuring reliability, scalability, and security of AI systems.ResponsibilitiesOversee, configure, monitor AI and ML systems, servers, and cloud environments to ensure optimal performance and uptimeManage GPU/CPU clusters and ensure efficient resource allocation for training and inference workloadsImplement and maintain scalable infrastructure to support large language models (LLMs), data processing pipelines, and model deploymentOptimize system performance through tuning, automation, and proactive maintenanceApply best practices for securing AI systems, ensuring data integrity, confidentiality and compliance with company and industry standardsManage user access, permissions, and security configurations across AI platformsSupport the deployment and integration of AI models and APIs into production environmentsCollaborate with developers, data scientists, and prompt engineers to ensure seamless system functionality and workflow automationMonitor system health, usage, and performance metrics; diagnose and resolve infrastructure or software issuesMaintain logs, conduct root cause analysis, and implement corrective actions to prevent recurrenceDevelop scripts and tools to automate system tasks, data transfers, and performance checksSupport CI/CD pipelines for AI model updates and system maintenanceCreate and maintain detailed documentation of system configurations, procedures, and troubleshooting guidesProvide technical support to AI teams, ensuring smooth operation of all AI systems and toolsStay up to date with advancements in AI infrastructure, cloud technologies, and MLOps practicesRecommend and implement improvements to enhance system reliability and scalabilitySkillsBachelor's degree in Computer Science, Information Technology, Data Engineering, or a related field2+ years of experience in systems administration, DevOps, or infrastructure management (AI/ML environment experience preferred)Strong understanding of cloud platforms (AWS, Azure, GCP) and containerization technologies (Docker, Kubernetes)Experience with Linux/Unix administration, Python/Bash scripting, and automation tools (Terraform, Ansible, Jenkins)Familiarity with machine learning frameworks (TensorFlow, PyTorch) and AI model deployment pipelinesUnderstanding of networking, security, and storage in distributed computing environmentsExperience with GPU-based computing and performance optimization for AI workloadsExcellent problem-solving, troubleshooting, and documentation skillsStrong collaboration and communication abilities to work with cross-functional AI and engineering teamsMust be authorized to work in the country where the job is basedMust be willing to submit up to a LEVEL II background and/or security investigation with a fingerprintMust be willing to submit to drug screeningCompany OverviewLeading America based business process outsourcing (BPO) provider specializing in customer service, inside sales and back-office. It was founded in 2015, and is headquartered in Beverly, Massachusetts, USA, with a workforce of 501-1000 employees. Its website is https://www.massmarkets.com/.