[Remote] Principal Data Scientist, Health Informatics
Note: The job is a remote job and is open to candidates in USA. Waymark is a team of healthcare providers, technologists, and builders whose mission is to bring the best healthcare to people with Medicaid benefits. They are seeking a Principal Data Scientist to own clinical data quality and bring senior ML/AI and health economics judgment to their core data science products.ResponsibilitiesOwn clinical data quality across claims, EHR, and ADT: Define standards for how clinical data is structured, normalized, and validated as modeling inputs across payer claims (medical, pharmacy, eligibility), EHR data (Epic, Cerner, Athena), and real-time ADT feeds. Bring deep familiarity with EHR data formats (FHIR, HL7, C-CDA) and how data from systems like Epic, Cerner, and Athena maps to clinical reality. Hold the bar for clinical accuracy and completeness across all three sourcesBuild and ship production ML/AI models: Develop, validate, and deploy risk stratification, care gap prediction, treatment effect estimation, and LLM/foundation model applications — with rigor around leakage, calibration, fairness, and clinical face validityApply health economics and outcomes methods: Translate raw clinical and claims data into decision-grade evidence through risk adjustment, utilization measurement, cost attribution, quasi-experimental evaluation, and outcomes measurement aligned with CMS, NCQA, and MCO reporting standardsAdvance machine and AI products: Bring senior modeling judgment to the product roadmap, owning the clinical and methodological soundness of what shipsSet standards and mentor: Make architectural trade-offs, drive alignment across data science, engineering, product, and clinical stakeholders, and mentor junior data scientists to raise the technical bar of the teamSkillsHealthcare Data Expertise: Deep, hands-on fluency with claims, EHR, and ADT data, and strong command of clinical terminologies (ICD-10, SNOMED CT, LOINC, RxNorm, CPT/HCPCS) and value set curationStandards Fluency: Working experience with healthcare data standards and exchange formats — FHIR, HL7v2, and C-CDAEducation: Master's degree in Data Science, Biostatistics, Health Informatics, Computer Science, or a related fieldPython Proficiency: 7-8+ years of hands-on experience in Python, including data science and ML librariesApplied ML/AI Experience: Demonstrated ability to build, validate, and deploy production ML models on healthcare data, with end-to-end ownership from development through deployment and maintenance in a live environment. Experience with ML pipelines, model versioning, and reproducible workflows at scaleProject Ownership: Proven ability to manage complex technical projects independently, align multiple stakeholders, and deliver on timelinesPhD in health informatics, statistics, data science, or computer scienceExperience integrating EHR/HIE data via TEFCA, CommonWell, or comparable networksHealth Economics & Outcomes Methods: Experience with risk adjustment, utilization and cost measurement, and quasi-experimental evaluationFamiliarity with MLOps best practices including experiment tracking and model registry (e.g. MLflow), CI/CD for ML pipelines, feature stores, and workflow orchestration tools such as SageMaker PipelinesPrior experience building on Medicaid or dual-eligible populationsPeer-reviewed publications in healthcare ML, AI, biostatistics, or health economicsBenefitsStock Options:Opportunity to invest in the company’s growth.Work-from-Home Stipend:A dedicated stipend for your first year to help set up your home office.Medical, Vision, and Dental Coverage:Comprehensive plans to keep you and your family healthy.Life Insurance: