Senior Data Engineer

Remote Full-time
Location: 100% Remote
Years? Experience: 10 years...
Education: Bachelor's in IT related field
Work Authorization: Must show that applicant is legally permitted to work in the United States.
Clearance: Applicants must be able to meet the requirements to obtain an Public Trust security clearance. NOTE: United States Citizenship is required to be eligible to obtain this security clearance.
Key Skills:
? 10 years of IT experience focusing on enterprise data architecture and management
? Experience with Databricks required
? 8 years experience in Conceptual/Logical/Physical Data Modeling & expertise in Relational and Dimensional Data Modeling
? Experience with Great Expectations or other data quality validation frameworks
? Experience with ETL and ELT tools such as SSIS, Pentaho, and/or Data Migration Services
? Advanced level SQL experience (Joins, Aggregation, Windowing functions, Common Table Expressions, RDBMS schema design, Postgres performance optimization)
? Experience with AWS environment, CI/CD pipelines, and Python (Python 3) a bonus
Responsibilities
? Plan, create, and maintain data architectures, ensuring alignment with business requirements
? Obtain data, formulate dataset processes, and store optimized data
? Identify problems and inefficiencies and apply solutions
? Determine tasks where manual participation can be eliminated with automation.
? Identify and optimize data bottlenecks, leveraging automation where possible
? Create and manage data lifecycle policies (retention, backups/restore, etc)
? In-depth knowledge for creating, maintaining, and managing ETL/ELT pipelines
? Create, maintain, and manage data transformations
? Maintain/update documentation
? Create, maintain, and manage data pipeline schedules
? Monitor data pipelines
? Create, maintain, and manage data quality gates (Great Expectations) to ensure high data quality
? Support AI/ML teams with optimizing feature engineering code
? Expertise in Spark/Python/Databricks, Data Lake and SQL
? Create, maintain, and manage Spark Structured Steaming jobs, including using the newer Delta Live Tables and/or DBT
? Research existing data in the data lake to determine best sources for data
? Create, manage, and maintain ksqlDB and Kafka Streams queries/code
? Data driven testing for data quality
? Maintain and update Python-based data processing scripts executed on AWS Lambdas
? Unit tests for all the Spark, Python data processing and Lambda codes
? Maintain PCIS Reporting Database data lake with optimizations and maintenance (performance tuning, etc)
? Streamlining data processing experience including formalizing concepts of how to handle lake data, defining windows, and how window definitions impact data freshness.
Qualifications
? 10 years of IT experience focusing on enterprise data architecture and management
? Experience in Conceptual/Logical/Physical Data Modeling & expertise in Relational and Dimensional Data Modeling
? Experience with Databricks, Structured Streaming, Delta Lake concepts, and Delta Live Tables required
? Additional experience with Spark, Spark SQL, Spark DataFrames and DataSets, and PySpark
? Data Lake concepts such as time travel and schema evolution and optimization
? Structured Streaming and Delta Live Tables with Databricks a bonus
? Experience leading and architecting enterprise-wide initiatives specifically system integration, data migration, transformation, data warehouse build, data mart build, and data lakes implementation / support
? Advanced level understanding of streaming data pipelines and how they differ from batch systems
? Formalize concepts of how to handle late data, defining windows, and data freshness
? Advanced understanding of ETL and ELT and ETL/ELT tools such as SSIS, Pentaho, Data Migration Service etc
? Understanding of concepts and implementation strategies for different incremental data loads such as tumbling window, sliding window, high watermark, etc.
? Familiarity and/or expertise with Great Expectations or other data quality/data validation frameworks a bonus
? Understanding of streaming data pipelines and batch systems
? Familiarity with concepts such as late data, defining windows, and how window definitions impact data freshness
? Advanced level SQL experience (Joins, Aggregation, Windowing functions, Common Table Expressions, RDBMS schema design, Postgres performance optimization)
? Indexing and partitioning strategy experience
? Debug, troubleshoot, design and implement solutions to complex technical issues
? Experience with large-scale, high-performance enterprise big data application deployment and solution
? Understanding how to create DAGs to define workflows
? Familiarity with CI/CD pipelines, containerization, and pipeline orchestration tools such as Airflow, Prefect, etc a bonus but not required
? Architecture experience in AWS environment a bonus
? Familiarity working with Kinesis and/or Lambda specifically with how to push and pull data, how to use AWS tools to view data in Kinesis streams, and for processing massive data at scale a bonus
? Experience with Docker, Jenkins, and CloudWatch
? Ability to write and maintain Jenkinsfiles for supporting CI/CD pipelines
? Experience working with AWS Lambdas for configuration and optimization
? Experience working with DynamoDB to query and write data
? Experience with S3
? Knowledge of Python (Python 3 desired) for CI/CD pipelines a bonus
? Familiarity with Pytest and Unittest a bonus
? Experience working with JSON and defining JSON Schemas a bonus
? Experience setting up and management Confluent/Kafka topics and ensuring performance using Kafka a bonus
? Familiarity with Schema Registry, message formats such as Avro, ORC, etc.
? Understanding how to manage ksqlDB SQL files and migrations and Kafka Streams
? Ability to thrive in a team-based environment
? Experience briefing the benefits and constraints of technology solutions to technology partners, stakeholders, team members, and senior level of management

Apply Now
Apply Now →

Similar Jobs

Experienced Registered Behavior Technician for In-Home ABA Therapy - Atlanta, GA

Remote

Immediate Hiring: Experienced Registered Behavioral Technician (RBT) for Clinic-Based ABA Therapy Services

Remote

Experienced Registered Behavioral Technician (RBT) - ABA Therapy for Children with Autism Spectrum Disorder

Remote

Experienced Registered Nurse - Telehealth: Providing Remote Care Coordination and Patient Support

Remote

Experienced Substitute Teacher for Riverside County Schools - Join Scoot Education's Innovative Team

Remote

Experienced Substitute Teacher for San Bernardino County - Flexible Schedules & Competitive Pay

Remote

Experienced School Year Instructional Coach for High-Dosage Tutoring Programs in Edgewater Park, NJ

Remote

Experienced School Year Tutor for K-8 Students in Math and Literacy - Mickleton, NJ

Remote

Experienced Secondary Social Studies Teacher for Kansas - Flexible Hybrid Remote Arrangement

Remote

USPS Office Helper

Remote

Experienced Text Chat Operator for Social Media - Work from Anywhere, Entry Level Opportunity with blithequark

Remote

Telehealth Registered Dietitian | FT W2

Remote

**Experienced 100% Remote Customer Service Specialist – Deliver Exceptional Support Experience with blithequark**

Remote

Finance & Compliance Analyst

Remote

Online English Teacher Jobs - (Part-Time) – USA Remote Jobs

Remote

Service Manager

Remote

Experienced Travel Customer Support Representative – Remote Travel Assistant for Seamless Travel Experiences

Remote

Experienced Technical Customer Support Specialist – Remote Opportunity for Exceptional Customer Service and Technical Troubleshooting Professionals

Remote

Remote Lead Catering Logistics Coordinator

Remote

Enterprise Architect

Remote
← Back