Lead Data Engineer - Scalable Data Pipelines - Contract to Hire

Remote Full-time
Lead Data Engineer (PySpark, Airflow, Azure) – Scalable Data Pipelines We’re looking for an experienced Senior Data Engineer to design, build, and optimize large-scale data pipelines powering analytics and machine learning workloads. This role is ideal for someone who is hands-on, performance-oriented, and comfortable leading other engineers while owning end-to-end data workflows. You’ll work on both batch and real-time processing, take ownership of Spark performance tuning, and help enforce best practices around data quality, governance, and reliability. ⸻ Responsibilities • Design, develop, and optimize scalable data pipelines using Python, PySpark, Apache Spark, and Airflow • Build and maintain batch and streaming data processing systems on Spark • Design and manage Airflow DAGs to orchestrate complex, dependency-heavy workflows • Implement data partitioning, caching, and Spark performance tuning to handle large datasets efficiently • Ensure data quality, governance, security, and reliability across the data lifecycle • Monitor, troubleshoot, and optimize data jobs, SLAs, and pipeline dependencies • Manage cloud infrastructure (Azure) for data workloads, including cost optimization • Implement CI/CD pipelines for data workflows using Git, Docker, and Infrastructure-as-Code tools • Support analytics and ML use cases by working with structured and unstructured data • Lead and mentor other data engineers, providing architectural guidance and code reviews • Promote best practices in coding standards, documentation, and version control • Collaborate effectively with distributed, remote teams in an Agile environment ⸻ ✅ Requirements • 8+ years of hands-on experience in Data Engineering • Strong expertise with Apache Spark / PySpark, including internals such as: • RDDs, DataFrames, DAG execution, partitioning, shuffles, and caching • Proven experience building and operating Airflow DAGs (scheduling, dependencies, retries, SLAs) • Advanced Python and SQL skills with a focus on performance and maintainability • Solid experience with Azure data and compute infrastructure • Working knowledge of Docker, Kubernetes, Terraform, and CI/CD best practices • Strong problem-solving skills and ability to optimize large-scale data processing systems • Prior experience leading or mentoring engineers • Comfortable working in Agile/Scrum environments • Excellent communication skills and ability to collaborate with remote teams ⸻ ⭐ Nice to Have • Experience with streaming frameworks (Spark Structured Streaming, Kafka, Event Hubs) • Familiarity with data governance, lineage, and observability tools • Experience supporting ML or advanced analytics pipelines • Background in cost-efficient Spark optimization at scale Apply tot his job
Apply Now →

Similar Jobs

Experienced Registered Behavior Technician for In-Home ABA Therapy - Atlanta, GA

Remote

Immediate Hiring: Experienced Registered Behavioral Technician (RBT) for Clinic-Based ABA Therapy Services

Remote

Experienced Registered Behavioral Technician (RBT) - ABA Therapy for Children with Autism Spectrum Disorder

Remote

Experienced Registered Nurse - Telehealth: Providing Remote Care Coordination and Patient Support

Remote

Experienced Substitute Teacher for Riverside County Schools - Join Scoot Education's Innovative Team

Remote

Experienced Substitute Teacher for San Bernardino County - Flexible Schedules & Competitive Pay

Remote

Experienced School Year Instructional Coach for High-Dosage Tutoring Programs in Edgewater Park, NJ

Remote

Experienced School Year Tutor for K-8 Students in Math and Literacy - Mickleton, NJ

Remote

Experienced Secondary Social Studies Teacher for Kansas - Flexible Hybrid Remote Arrangement

Remote

USPS Office Helper

Remote

Analyst - Model Build

Remote

[Remote] Cyber Security Specialist (SOC / Incident Response)

Remote

Senior Lead Counsel & Group Director - International Employment Law & Immigration

Remote

Experienced Remote Data Entry Specialist – Flexible Home-Based Opportunity for Detail-Oriented Professionals with arenaflex

Remote

Client Services Associate

Remote

Private Jet Charter Sales Specialist (Remote)

Remote

[Remote] Digital Project Manager (WordPress projects, Remote)

Remote

**Experienced Full Stack Data Entry Specialist – Cloud-Based Data Pipelines and ETL Process Management**

Remote

Software Development Lead- MuleSoft

Remote

Communications and External Affairs Manager

Remote
← Back