[Remote] Platform Engineer
Note: The job is a remote job and is open to candidates in USA. Hyrhub is seeking a Senior Infrastructure Architect / Platform Engineer for their AI/ML platform to provide technical leadership for cloud platforms that support enterprise-scale generative AI applications. The role involves defining infrastructure architecture, leading platform standards, and collaborating with various engineering teams to enhance operational maturity across AI platforms.ResponsibilitiesDefine and drive the technical strategy for AI/ML platform infrastructure supporting generative AI applications, LLM integrations, model routing, and enterprise AI servicesArchitect, build, and operate scalable cloud platforms using AWS services such as EKS, ECS Fargate, Lambda, DynamoDB, S3, OpenSearch, Secrets Manager, CloudWatch, ALB, and MWAAEstablish reusable infrastructure patterns using CloudFormation, Helm, and Terraform to support reliable multi-environment and multi-region deploymentsLead CI/CD architecture using GitHub Actions, reusable workflows, OIDC-based AWS authentication, automated quality gates, deployment promotion, and environment approvalsDesign and improve observability across AI platforms, including CloudWatch dashboards, logs, alarms, Prometheus/Grafana, OpenSearch, Langfuse, and LLM-specific operational metricsBuild platform capabilities for GenAI workloads, including model availability monitoringPartner with software engineering teams to improve deployment reliability, rollback strategies, health checks, autoscaling, load testing, and runtime performanceDefine and enforce security and compliance practices for infrastructure, including IAM permission boundaries, Secrets Manager usage, secret scanning, audit logging, tagging standards, and change-management controlsProvide technical leadership for cost optimization, capacity planning, environment standardization, and operational resilience across development, test, production, and sandbox environmentsMentor engineers, review architecture and infrastructure designs, and influence platform engineering practices across teamsTroubleshoot complex production issues across cloud infrastructure, networking, containers, serverless workloads, CI/CD systems, and observability platformsTranslate enterprise requirements for security, compliance, reliability, and governance into pragmatic engineering standards and automationSkillsBachelor's degree in Computer Science, Engineering, Information Technology, or a related technical field, or equivalent practical experience7+ years of experience in DevOps, platform engineering, cloud infrastructure, site reliability engineering, or software engineering rolesStrong hands-on experience with AWS/Azure/GCP infrastructure and services, including container, serverless, networking, storage, observability, and security servicesExperience designing and operating production systems on Kubernetes, ECS/Fargate, or comparable container orchestration platformsProficiency with infrastructure-as-code, especially CloudFormation, Terraform, Helm, or similar toolingStrong CI/CD experience with GitHub Actions or similar platforms, including reusable workflows, automated testing, deployment gates, and cloud authenticationExperience building and operating observability solutions using CloudWatch, Prometheus/Grafana, OpenSearch, or similar toolsStrong understanding of cloud security practices, IAM, secrets management, least-privilege access, audit logging, and compliance requirementsExperience supporting distributed systems, microservices, APIs, asynchronous workloads, and multi-environment deploymentsDemonstrated ability to lead technical design, mentor engineers, and influence engineering practices across teamsExperience supporting AI/ML or generative AI platforms, including LLM gateways, model routing, prompt observability, token metering, or model failoverExperience operating platforms in regulated enterprise environments, ideally healthcare, pharmaceutical, finance, or life sciencesExperience with multi-account, multi-region AWS architectures and enterprise governance patternsExperience with cost optimization, autoscaling strategies, capacity planning, and cloud budget monitoringExperience with load testing and performance validation using tools such as Locust or comparable frameworksStrong Python or scripting skills for platform automation, operational tooling, and CI/CD extensionsAbility to communicate complex technical decisions clearly to engineering, security, operations, and leadership audiencesCompany OverviewHyrhub was founded in 2014, hiring niche talent is still a problem faced by many companies. It was founded in 2018, and is headquartered in Bangalore, Karnataka, IN, with a workforce of 2-10 employees. Its website is .