[Remote] Staff Data Engineer - Emerald

Remote Full-time
Note: The job is a remote job and is open to candidates in USA. H1 is dedicated to providing optimal healthcare information access and is seeking a Staff Data Engineer for their Emerald team. This role involves leading the architecture and scalability of H1’s healthcare entity resolution platform while managing a small team and collaborating with various stakeholders to enhance the platform's efficiency and accuracy.ResponsibilitiesLead the design, optimization, and scalability of distributed Spark/PySpark pipelines powering entity resolution and large-scale healthcare data processingOwn systems supporting automatching, identity mapping, grouping logic, deduplication, enrichment, and auto-approval workflows across healthcare provider and organization datasetsBuild and maintain scalable processing frameworks for PubMed, clinical trial, ct.gov, conference, and other healthcare data sourcesDrive infrastructure optimization initiatives focused on improving throughput, runtime, observability, and cloud compute cost efficiencyPartner closely with AI/ML teams to integrate matching and resolution models into EMERALD and improve matching precision and recallLead complex technical initiatives from architecture and design through deployment, monitoring, and long-term production supportServe as a technical leader and mentor across the team through code reviews, technical guidance, and engineering best practicesCollaborate directly with Product and business stakeholders to align technical solutions with operational and customer needsSupport production operations, incident response, troubleshooting, and ongoing platform reliabilitySkills8+ years of experience building and maintaining large-scale distributed data systems and pipelinesDemonstrated technical leadership experience mentoring engineers and driving complex technical initiativesExtensive experience with Apache Spark and AWS-based big data technologies including EMR, S3, and distributed compute environmentsStrong coding experience in Python (PySpark), Scala, Java, or equivalent languages used for distributed processing systemsExperience optimizing large-scale Spark workloads for performance, scalability, and infrastructure cost efficiencyExperience with streaming and event-driven architectures using technologies such as Kafka or Spark StreamingExperience with orchestration and lakehouse technologies such as Argo and Hudi or comparable platformsExperience with containerization and infrastructure technologies such as Docker, Kubernetes, and TerraformExperience working with relational or distributed databases such as PostgreSQL or RedshiftProven ability to operate effectively within highly scalable, production-grade distributed systemsDeep expertise with distributed data processing frameworks such as Apache Spark and Hadoop, particularly within AWS environmentsStrong proficiency in Python (PySpark), Scala, Java, or other modern programming languages used for large-scale distributed processingExperience building scalable ETL/ELT frameworks across both batch and streaming architecturesStrong understanding of distributed file formats including Apache Parquet and Apache AVROExperience with streaming technologies such as Kafka, Spark Streaming, or KSQLStrong grasp of software engineering fundamentals including distributed systems, data structures, concurrency, and system designExperience performing root cause analysis across large-scale distributed systems and complex data pipelinesAbility to write clean, maintainable, modular, and production-grade codeExperience improving performance, scalability, observability, and infrastructure efficiency within distributed systemsStrong communication and collaboration skills across both technical and non-technical stakeholdersFamiliarity with modern development and infrastructure tooling including Git, CI/CD pipelines, Docker, Kubernetes, Terraform, Argo, Hudi, and JIRAExperience with entity resolution, identity mapping, automatching, deduplication, or large-scale matching systems is strongly preferredExperience working with healthcare, life sciences, Real World Evidence (RWE), or large-scale healthcare datasets is strongly preferredBenefitsStock optionsFull suite of health insurance optionsGenerous paid time offPre-planned company-wide wellness holidaysRetirement optionsHealth & charitable donation stipendsImpactful Business Resource GroupsFlexible work hours & the opportunity to work from anywhereThe opportunity to work with leading biotech and life sciences companies in an innovative industry with a mission to improve healthcare around the globeCompany OverviewH1 is on a mission to connect the world with the right doctors. It was founded in 2017, and is headquartered in New York, New York, USA, with a workforce of 201-500 employees. Its website is https://www.h1.co.Company H1B SponsorshipH1 has a track record of offering H1B sponsorships, with 5 in 2025, 6 in 2024, 4 in 2023, 9 in 2022, 7 in 2021. Please note that this does not guarantee sponsorship for this specific role.

Apply Now →

Similar Jobs

Experienced Registered Behavior Technician for In-Home ABA Therapy - Atlanta, GA

Remote

Immediate Hiring: Experienced Registered Behavioral Technician (RBT) for Clinic-Based ABA Therapy Services

Remote

Experienced Registered Behavioral Technician (RBT) - ABA Therapy for Children with Autism Spectrum Disorder

Remote

Experienced Registered Nurse - Telehealth: Providing Remote Care Coordination and Patient Support

Remote

Experienced Substitute Teacher for Riverside County Schools - Join Scoot Education's Innovative Team

Remote

Experienced Substitute Teacher for San Bernardino County - Flexible Schedules & Competitive Pay

Remote

Experienced School Year Instructional Coach for High-Dosage Tutoring Programs in Edgewater Park, NJ

Remote

Experienced School Year Tutor for K-8 Students in Math and Literacy - Mickleton, NJ

Remote

Experienced Secondary Social Studies Teacher for Kansas - Flexible Hybrid Remote Arrangement

Remote

USPS Office Helper

Remote

Integration & Automation Developer (Contract) - Remote / Hybrid / In- Office

Remote

DevSecOps Engineer

Remote

Appliance Repair Technician

Remote

Coordinator Workforce Real Time - Patient Access Center-Kelsey Seybold Clinic - Remote

Remote

Customer Dispatch Support - Night shift REMOTE

Remote

Delta Airlines Needs Flight Attendant ( Grand Prairie )

Remote

Adminstrative Assistant /Data Entry (Remote) - Australia

Remote

Immediate Hiring: Customer Support Specialist, Need Linux and MSP

Remote

Lactation Technician - Part-Time - Baltimore City Health Department - Supporting Breastfeeding Mothers and Infants with Compassion and Expertise

Remote

Experienced Customer Service Representative – Work From Home Opportunity with careerzynith

Remote
← Back