[Remote] Data Infrastructure Engineer
Note: The job is a remote job and is open to candidates in USA. TEKsystems is a leading provider of business and technology services, and they are seeking a Data Infrastructure Engineer to build and operate the data platform that powers AI/ML analytics modules. The role involves designing and implementing scalable data ingestion pipelines, robust ETL/ELT processes, and ensuring data governance while collaborating with various teams.ResponsibilitiesBuild & Operate Data Pipelines (Batch + Streaming)Design and implement batch and streaming ingestion from APIs, relational databases, file drops, event streams, and external partnersBuild and optimize ETL/ELT pipelines to produce curated, analytics-ready datasets for reporting and ML consumptionImplement incremental processing patterns, change data capture (CDC) approaches where appropriate, and data contract standardsDeliver a Modern Lakehouse (Data Lake / Delta Lake)Build and manage a scalable lakehouse on AWS object storage (e.g., S3) using open table/file formats and delta/lakehouse concepts (e.g., ACID tables, schema evolution, time travel patterns)Optimize performance and cost through partitioning, compaction, lifecycle policies, and efficient compute/storage usageEstablish environment standards for dev/test/prod and consistent promotion across stagesMetadata, Governance, Lineage & Quality (Trust Layer)Implement a managed metadata repository for dataset cataloging, ownership, glossary/definitions, tagging, and discoverabilityEnable end-to-end lineage (source → transformations → consumption) to support auditability and impact analysisImplement governance controls including policy-based access, data classification, retention, and secure data handlingBuild operational data quality checks (freshness, completeness, validity, anomaly detection) and publish SLAs/SLOsAWS Automation + CI/CD for Data PipelinesImplement automated cloud provisioning in AWS using Infrastructure as Code (IaC) for consistent environments and secure-by-default baselinesBuild and enhance CI/CD for data pipelines, including automated tests, validation gates, promotion workflows, and rollback strategiesImprove observability with metrics/logs/alerts, dashboards, runbooks, and incident response readinessCross-Team Collaboration & DocumentationWork closely with engineering, security, networking, and application teams to support mission needs and delivery timelinesMaintain high-quality engineering documentation including SOPs, system diagrams, and secure configuration baselinesSummarize and present findings and recommendations—both written and verbal—to technical and non-technical stakeholdersSkillsMust be able to OBTAIN and MAINTAIN a Federal or DoD 'PUBLIC TRUST'; candidates must obtain approved adjudication of their PUBLIC TRUST prior to onboarding with Guidehouse. Candidates with an ACTIVE PUBLIC TRUST or SUITABILITY are preferredBachelor's degree in Engineering, IT, Computer Science, or related field (or equivalent experience)Minimum of FOUR (4) years experience building production data pipelines and/or data platformsStrong experience implementing data ingestion and ETL/ELT workflows, including data modeling and transformation best practicesHands-on experience building a data lake / delta lake (lakehouse) on AWS (or equivalent cloud) using object storage and modern table formats/patternsProficiency in SQL and one programming language commonly used for data engineering (Python preferred; Scala/Java acceptable)Experience with metadata management and governance: cataloging, lineage, ownership, access controls, classification and policy enforcementExperience implementing automated AWS provisioning using IaC and operating across multiple environmentsExperience building or operating CI/CD pipelines for data workflows (testing, packaging, deployment automation, environment promotion)Solid security fundamentals: IAM/least privilege, encryption, secrets management, secure SDLC practicesHands-on experience with DatabricksHands-on experience utilizing modern DevOps practices, including tools like Git, Terraform, Jenkins, AWS CodePipeline, and DockerExperience utilizing AI-assisted coding tools (e.g., GitHub Copilot, ChatGPT, Cursor, Kiro) to safely accelerate implementation while maintaining strict code quality through testing, code reviews, and security practicesKnowledge graph and Graph RAG experience, including: Graph modeling and ontology/taxonomy alignment, Entity resolution and relationship extraction, Hybrid retrieval approaches combining graBenefitsMedical, dental & visionCritical Illness, Accident, and Hospital401(k) Retirement Plan – Pre-tax and Roth post-tax contributions availableLife Insurance (Voluntary Life & AD&D for the employee and dependents)Short and long-term disabilityHealth Spending Account (HSA)Transportation benefitsEmployee Assistance ProgramTime Off/Leave (PTO, Vacation or Sick Leave)Company OverviewAt TEKsystems, they understand people. Every year they deploy over 80,000 IT professionals at 6,000 client sites across North America, It was founded in 1994, and is headquartered in Hanover, Maryland, USA, with a workforce of 10001+ employees. Its website is http://www.teksystems.com.