[Remote] Machine Learning Engineer (Platform)
Note: The job is a remote job and is open to candidates in USA. ArteraAI is an AI startup focused on developing medical artificial intelligence tests to personalize therapy for cancer patients. As a Machine Learning Engineer, you will work on the AI Platform team to establish scalable pipelines for model training and data processing, collaborating closely with model developers and platform engineers.ResponsibilitiesAccountable for Artera’s ML compute infrastructure including scaling up Artera’s Foundation Model development by developing distributed training infrastructure and developer librariesBuild and evolve the core libraries used by AI scientists to develop, launch, and monitor AI productsWork with model developers to optimize GPU and CPU efficiency and data throughput of large-scale foundation models and downstream model training runsOptimize Artera’s ability to store and serve terabytes of digital pathology data efficiently for the use in serving large-scale training regimesEnsure that Artera’s observability infrastructure provides a clear picture of how to continue to optimize performance across our model landscapeSkills5+ years of industry software engineering experience4+ years of industry experience using one of PyTorch, TensorFlow, or JAX in Python3+ years of industry experience building with AWS, Docker, and Kubernetes1+ years of industry experience optimizing large-scale, high data-throughput, distributed machine learning training pipelinesThis is a remote role open to candidates who are currently authorized to work either in the United States or in Canada without the need for current or future employment-based visa sponsorshipExperience in using ML orchestration frameworks such as Flyte, Ray, Kubeflow, Metaflow, MLFlow, Dagster, Argo Workflow or PrefectExperience using Terraform, SqlAlchemyExperience in multi-node and multi-gpu trainingExperience deploying and maintaining infrastructure for machine learning training and production inferenceFamiliarity with TorchScript, ONNXRuntime, DeepSpeed, AWS Neuron or similar approaches to inference optimizationCompany OverviewOur AI can prognosticate patient outcomes & personalize treatment in localized prostate cancer. It was founded in 2021, and is headquartered in Los Altos, California, USA, with a workforce of 51-200 employees. Its website is https://artera.ai.