[Remote] New Grad Data Engineer (for Health Tech Startup)🤓
Note: The job is a remote job and is open to candidates in USA. 1Phi Health is a health tech startup focused on making healthcare more accessible. They are seeking a New Grad Data Engineer to build and maintain data pipelines, ensuring data quality and collaborating with data scientists and product engineers in the healthcare data domain.ResponsibilitiesBuild and maintain data pipelines that ingest, transform, and validate large-scale Medicare claims data using SQL, Python, and Databricks (Spark). You'll work with patient-level records across billions of claim linesWrite and optimize complex SQL — multi-step transformations, window functions, joins across large datasets, aggregations with suppression rules. SQL is the primary language of the workAutomate and operationalize recurring data workflows — building reliable, repeatable pipelines that process CMS data extracts, dimension tables, and derived provider metricsEnsure data quality by designing validation checks, reconciling source data against expected schemas, and investigating anomalies when numbers don't add upCollaborate with data scientists and product engineers to define output schemas, deliver clean datasets, and support downstream analytics and application featuresWork in cloud infrastructure — primarily Databricks on AWS, with exposure to S3, Unity Catalog, and related servicesLearn the healthcare data domain — you'll develop working knowledge of claims data structures, medical coding systems (ICD-10, HCPCS, DRG), and CMS data programsSkillsBuild and maintain data pipelines that ingest, transform, and validate large-scale Medicare claims data using SQL, Python, and Databricks (Spark). You'll work with patient-level records across billions of claim linesWrite and optimize complex SQL — multi-step transformations, window functions, joins across large datasets, aggregations with suppression rules. SQL is the primary language of the workAutomate and operationalize recurring data workflows — building reliable, repeatable pipelines that process CMS data extracts, dimension tables, and derived provider metricsEnsure data quality by designing validation checks, reconciling source data against expected schemas, and investigating anomalies when numbers don't add upCollaborate with data scientists and product engineers to define output schemas, deliver clean datasets, and support downstream analytics and application featuresWork in cloud infrastructure — primarily Databricks on AWS, with exposure to S3, Unity Catalog, and related servicesLearn the healthcare data domain — you'll develop working knowledge of claims data structures, medical coding systems (ICD-10, HCPCS, DRG), and CMS data programsYou have strong SQL skills. Coursework, internships, or projects where you wrote non-trivial queries — joins, CTEs, window functions, aggregations. You can reason about query performanceYou're comfortable with Python. You've used it for data manipulation (pandas, PySpark, or similar). You don't need to be a software engineer, but you can write clean, functional codeYou understand data pipeline concepts — ETL/ELT, idempotency, schema management, data validation. Exposure through coursework, capstone projects, or internships countsYou're detail-oriented and methodical. Healthcare data has strict rules around suppression, privacy, and accuracy. You care about getting the numbers rightYou're a fast learner who's comfortable ramping up on unfamiliar domains. You'll be learning Medicare claims data, CMS programs, and healthcare coding systems on the jobYou have a BS or MS in Computer Science, Data Science, Information Systems, Statistics, or a related fieldYou've worked with Spark, Databricks, or other distributed compute environments (even in a class or personal project)You have exposure to cloud platforms (AWS, GCP, or Azure) — S3, IAM, or managed database servicesYou've touched healthcare data in any capacity — claims, EHR, public health datasets, MIMIC, CMS public use filesYou're familiar with version control (Git) and collaborative development workflowsYou've built a data project end-to-end — ingestion through delivery — even if it was smallBenefitsHealth insurance within 3 months of startingGenerous vacation policy + company holidays401K + profit share contributionsQuarterly evals and performance bonus (~10% at start, ~20% after 4 years)Company Overview It was founded in undefined, and is headquartered in , with a workforce of 2-10 employees. Its website is https://1phi.com/.