[Remote] AI-Enabled Data Engineer
Note: The job is a remote job and is open to candidates in USA. TechTorch is building the future of intelligent work by helping companies design, build, and deploy AI agents to automate complex workflows. The AI-Enabled Data Engineer will focus on creating scalable data pipelines, managing data quality, and integrating AI capabilities into data engineering processes.ResponsibilitiesDesign, build, and maintain scalable data pipelines and ETL/ELT workflows across cloud and on-prem environmentsWork with Snowflake, Databricks, and Delta Lake as primary data platforms — handling ingestion, transformation, storage optimization, and access patternsModel data with dbt: write modular SQL transformations, manage dependencies, enforce data contracts, and maintain documentationBuild and maintain semantic layers that serve consistent, governed metrics to downstream consumersDesign data warehouse schemas and data lake structures that balance performance, cost, and queryabilityImplement data quality frameworks — testing, validation, alerting, and lineage — as first-class citizens in every pipelineOrchestrate workflows across Airflow, Dagster/Prefect, Azure Data Factory, and Databricks Workflows — choosing the right tool for each jobApply DataOps practices: CI/CD for data pipelines, environment promotion, infrastructure as code, and observabilityOwn the reliability of data products end-to-end — monitoring, alerting, incident response, and root cause analysisWork across AWS and Azure cloud services (S3, Glue, ADLS, ADF, Synapse, Redshift) to design cost-effective, scalable architecturesBuild data pipelines that feed AI systems — including RAG ingestion workflows, vector store loading, document chunking, and embedding pipelinesUse LLMs as active components in ETL logic: classification, entity extraction, enrichment, and data quality remediation in-flightExpose data infrastructure as consumable tools for AI agents via MCP or similar agent-integration patternsUse AI-paired programming (Claude Code or equivalent) as a daily productivity layer — not just autocomplete, but genuine workflow accelerationStay current on how AI tooling changes the data engineering workflow and bring those patterns back to the teamSkillsETL/ELT DesignData ModelingData Quality & TestingData LineageBatch & Incremental LoadsSnowflakeDatabricksApache Spark / PySparkDelta LakeData WarehousesData LakesDbt Core / dbt CloudSQL (advanced)Semantic LayerDimensional ModelingApache AirflowDagster / PrefectAzure Data FactoryDatabricks WorkflowsRAG & Vector Store PipelinesAI-Augmented ETLMCP / Agent Data ToolsAI-Paired ProgrammingLLM Integration in PipelinesAWS (S3, Glue, Redshift)Azure (ADLS, ADF, Synapse)CI/CD for DataInfrastructure as CodePythonExperience with streaming architectures: Kafka, Spark Streaming, or FlinkExposure to feature stores (Feast, Tecton) or ML platform data pipelinesHands-on with vector databases: Pinecone, Weaviate, Qdrant, or pgvectorFamiliarity with data mesh or data product ownership modelsExperience with Snowpark or Databricks AI/BI toolingBuilding or contributing to internal data tooling, frameworks, or acceleratorsCompany OverviewTechTorch is a AI powered Tech Consulting company It was founded in 2021, and is headquartered in San Mateo, California, USA, with a workforce of 51-200 employees. Its website is https://www.techtorch.io/.Company H1B SponsorshipTechTorch has a track record of offering H1B sponsorships, with 4 in 2025. Please note that this does not guarantee sponsorship for this specific role.