[Remote] Platform Engineer
Note: The job is a remote job and is open to candidates in USA. HHAeXchange is the leading technology platform for home and community-based care, founded in 2008. They are seeking a Platform Engineer to join their Data & AI Engineering team, focusing on platform reliability and delivery automation to ensure the infrastructure for their AI platform and data pipelines is stable and scalable.ResponsibilitiesOwn availability, latency, and performance targets for AI platform services and data infrastructure running on AWSDesign and implement monitoring, alerting, and observability frameworks across the platform stackLead incident response, root cause analysis, and post-mortem processes for platform-level outages or degradationsDefine and track SLOs/SLAs for core platform primitives including RAG pipelines, agent orchestration services, and model access layersProactively identify reliability risks and drive engineering improvements before they become production issuesBuild and maintain runbooks, disaster recovery procedures, and operational documentationDesign, build, and maintain CI/CD pipelines for AI platform components, data pipelines, and internal applicationsOwn infrastructure-as-code (IaC) practices across the team using tools such as Terraform or AWS CDKManage and optimize AWS environments including ECS, Lambda, S3, RDS, Redshift, API Gateway, and related servicesImplement and enforce security, compliance, and cost optimization best practices across AWS infrastructureAutomate deployment, scaling, and configuration management to reduce manual operational overheadPartner with AI Platform Engineers to containerize and operationalize AI services and agent frameworksSupport Data & AI Engineers with environment management, access controls, and deployment tooling for Polaris and data pipeline infrastructureServe as the operational backbone for the AI platform team – ensuring engineers can ship and iterate quickly without being blocked by infrastructure concernsContribute to our 'factory model' vision by making deployment and reliability a repeatable, scalable capability rather than an ad hoc functionOther duties as assigned by supervisor or HHAeXchange leaderSkills3+ years of professional experience in a DevOps, SRE, or platform engineering roleHands-on AWS experience required – AgentCore, Bedrock, ECS, Lambda, S3, RDS, Redshift, CloudWatch, IAM, VPC, and related servicesExperience with infrastructure-as-code tools such as Terraform or AWS CDKStrong CI/CD experience with tools such as GitHub ActionsExperience with containerization and orchestration (Docker, ECS, or Kubernetes)Familiarity with AI/ML infrastructure patterns – model serving, vector databases, pipeline orchestration (strongly preferred)Experience with observability and monitoring tooling (Datadog, CloudWatch)Prior experience in a SaaS environmentStrong verbal and written communication skills with ability to collaborate across technical and non-technical stakeholdersSelf-starter with a proactive approach to identifying and resolving infrastructure risk before it impacts deliveryWillingness to explore and adopt AI tools responsibly to enhance productivity and innovation in your roleBenefitsThis is a benefits-eligible position.HHAeXchange offers competitive health plansPaid time-offCompany paid holidays401K retirement program with a Company elected matchIncluding other company sponsored programsCompany OverviewAt HHAeXchange, we believe that healthcare should be simple, effective, and transparent. It was founded in 2008, and is headquartered in Long Island City, New York, USA, with a workforce of 501-1000 employees. Its website is https://hhaexchange.com.