[Remote] Data Engineer
Note: The job is a remote job and is open to candidates in USA. Pavago is seeking a Data Engineer to design, build, and maintain scalable data infrastructure and reliable data pipelines for analytics and operational decision-making. The role involves ensuring seamless data flow from source systems into warehouses while maintaining high standards for quality and governance.ResponsibilitiesBuild, maintain, and optimize ETL/ELT pipelines using Python, SQL, or ScalaOrchestrate workflows using Airflow, Prefect, Dagster, or similar orchestration toolsIngest structured and unstructured data from APIs, SaaS platforms, databases, files, and streaming systemsDevelop scalable connectors and automated ingestion workflowsManage and optimize cloud data warehouses such as Snowflake, BigQuery, or RedshiftDesign scalable schemas using star and snowflake modeling techniquesImplement partitioning, clustering, indexing, and performance optimization strategiesBuild clean, analytics-ready datasets for business intelligence and reporting use casesImplement validation checks, anomaly detection, logging, and monitoring to ensure data integrityEnforce naming conventions, lineage tracking, and documentation standards using tools such as dbt or Great ExpectationsMaintain audit-ready data processes and ensure compliance with GDPR, HIPAA, or industry-specific requirementsMonitor pipeline health and proactively resolve failures or inconsistenciesBuild and manage real-time data pipelines using Kafka, Kinesis, Pub/Sub, or similar platformsSupport low-latency ingestion and event-driven architectures for time-sensitive applicationsMonitor streaming infrastructure and optimize throughput and reliabilityPartner closely with analysts, data scientists, and business stakeholders to deliver reliable datasetsSupport dashboard and reporting initiatives across Tableau, Looker, or Power BITranslate business requirements into scalable data solutions and modelsMaintain clear technical documentation for pipelines, schemas, and workflowsContainerize data services using Docker and manage deployments through Kubernetes when applicableAutomate deployments using CI/CD pipelines such as GitHub Actions, Jenkins, or GitLab CIManage cloud infrastructure using Terraform, CloudFormation, or similar Infrastructure-as-Code toolsContinuously optimize performance, scalability, reliability, and cloud costsSkills3+ years of experience in Data Engineering, Back-End Engineering, or Data Infrastructure rolesStrong proficiency in Python and SQLExperience with at least one modern data warehouse (Snowflake, Redshift, BigQuery)Hands-on experience with orchestration tools such as Airflow or PrefectStrong understanding of ETL/ELT pipelines, data modeling, and data transformation workflowsFamiliarity with cloud platforms such as AWS, GCP, or AzureExperience with dbt for data modeling and transformation managementStreaming and event-driven data pipeline experience (Kafka, Kinesis, Pub/Sub)Experience with cloud-native data services such as AWS Glue, GCP Dataflow, or Azure Data FactoryFamiliarity with Docker, Kubernetes, Terraform, or CI/CD workflowsBackground in regulated industries such as healthcare, fintech, or enterprise SaaSExperience optimizing warehouse costs and query performance at scaleCompany OverviewPavago - Thinking Globally to Grow Locally 🌍 Welcome to Pavago, where the world is your talent pool. It was founded in 2022, and is headquartered in Meridian , Idaho, US, with a workforce of 11-50 employees. Its website is https://pavago.co.