[Remote] AI Data Engineer – Scientific Data Platforms (Remote)
Note: The job is a remote job and is open to candidates in USA. Astrix is a leading global biotechnology and pharmaceutical organization focused on innovation and access to healthcare. They are seeking an AI Data Engineer to scale AI models for drug discovery by building automated data ingestion and curation pipelines for genomics data.ResponsibilitiesBuild an agentic data ingestion pipeline and move beyond bespoke steps toward agents that teams can reliably use as a shared, deployed serviceTriage and prioritize incoming requests to ingest specific datasets. Clean and organize data, building the first-pass cleaning and organization steps into the agentic flowValidate cross-modal linkage. Add automated checks that catch when ingested data does not connect correctly and flag low-quality or mismatched recordsVersion every dataset, retaining and making prior versions addressable. Preserve raw data and provenance, ensuring agent workflows log validation and transformation steps so lineage is fully traceablePartner with AI, software engineering, and computational biology groups to co-define data standards and conventionsSkillsDemonstrated experience building multi-agent workflows or LLM workflows using tools/frameworks such as LangGraph or LlamaIndex, including tool/function calling and asynchronous task executionStrong Python skills for data manipulation, working with APIs and databases, and handling heterogeneous data formatsFamiliarity with dataset versioning approaches (e.g., DVC, lakeFS, or equivalent)Comfortable with or showing a strong willingness to learn common omics data formats like AnnData, H5AD, and TileDBNo deep bioinformatics expertise required; just a basic conceptual understanding of different modalities (e.g., RNA-seq vs. scRNA-seq vs. WES; genomics vs. transcriptomics vs. proteomics vs. metabolomics)Comfortable writing unit and functional tests to ensure data processing workflows are reliable and reproducibleDegree in a technical field or equivalent practical experienceMust be Authorized to work in the United States without SponsorshipExperience deploying agent workflows as a shared service (e.g., FastAPI or MCP endpoints)Exposure to cloud platforms (AWS, GCP) and containerization (Docker)Familiarity with scientific workflow managers such as Nextflow or SnakemakeBenefitsPlus benefitsCompany OverviewAstrix is the global leader in delivering innovative strategies and solutions to the life sciences industry. It was founded in 1995, and is headquartered in Red Bank, New Jersey, USA, with a workforce of 501-1000 employees. Its website is http://astrixinc.com.