[Remote] Data Engineer

Remote Full-time
Note: The job is a remote job and is open to candidates in USA. Fusemachines is a global provider of enterprise AI products and services, dedicated to democratizing AI. They are seeking a skilled Senior Data Engineer responsible for designing, building, and maintaining the infrastructure required for data integration, storage, processing, and analytics.ResponsibilitiesArchitect, design, develop, test and maintain high-performance, large-scale, complex data architectures, which support data integration (batch and real-time, ETL and ELT patterns from heterogeneous data systems: APIs and platforms), storage (data lakes, warehouses, data lake houses, etc), processing, orchestration and infrastructure. Ensuring the scalability, reliability, and performance of data systems, focusing on Databricks and AzureContribute to detailed design, architectural discussions, and customer requirements sessionsActively participate in the design, development, and testing of big data productsConstruct and fine-tune Apache Spark jobs and clusters within the Databricks platformMigrate out of Azure Synapse to Azure Data Lake or other technologiesAssess best practices and design schemas that match business needs for delivering a modern analytics solution (descriptive, diagnostic, predictive, prescriptive)Design and implement data models and schemas that support efficient data processing and analyticsDesign and develop clear, maintainable code with automated testing using Pytest, unittest, integration tests, performance tests, regression tests, etcCollaborating with cross-functional teams and Product, Engineering, Data Scientists and Analysts to understand data requirements and develop data solutions, including reusable components meeting product deliverablesEvaluating and implementing new technologies and tools to improve data integration, data processing, storage and analysisEvaluate, design, implement and maintain data governance solutions: cataloging, lineage, data quality and data governance frameworks that are suitable for a modern analytics solution, considering industry-standard best practices and patternsContinuously monitor and fine-tune workloads and clusters to achieve optimal performanceProvide guidance and mentorship to junior team members, sharing knowledge and best practicesMaintain clear and comprehensive documentation of the solutions, configurations, and best practices implementedPromote and enforce best practices in data engineering, data governance, and data qualityEnsure data quality and accuracyDesign, Implement and maintain data security and privacy measuresBe an active member of an Agile team, participating in all ceremonies and continuous improvement activities, being able to work independently as well as collaborativelySkillsMust have a full-time Bachelor's degree in Computer Science or similarAt least 3 years of experience as a data engineer with strong expertise in Databricks, Azure, DevOps, or other hyperscalers3+ years of experience with Azure DevOps, GitHubProven experience delivering large scale projects and products for Data and Analytics, as a data engineer, including migrationsStrong programming Skills in one or more languages such as Python (must have), Scala, and proficiency in writing efficient and optimized code for data integration, migration, storage, processing and manipulationStrong understanding and experience with SQL and writing advanced SQL queriesThorough understanding of big data principles, techniques, and best practicesStrong experience with scalable and distributed Data Processing Technologies such as Spark/PySpark (must have: experience with Azure Databricks), DBT and Kafka, to be able to handle large volumes of dataSolid Databricks development experience with significant Python, PySpark, Spark SQL, Pandas, NumPy in Azure environmentStrong experience in designing and implementing efficient ELT/ETL processes in Azure and Databricks and using open source solutions being able to develop custom integration solutions as neededSkilled in Data Integration from different sources such as APIs, databases, flat files, event streamingExpertise in data cleansing, transformation, and validationProficiency with Relational Databases (Oracle, SQL Server, MySQL, Postgres, or similar) and NonSQL Databases (MongoDB or Table)Good understanding of Data Modeling and Database Design Principles. Being able to design and implement efficient database schemas that meet the requirements of the data architecture to support data solutionsStrong experience in designing and implementing Data Warehousing, data lake and data lake house, solutions in Azure and DatabricksGood experience with Delta Lake, Unity Catalog, Delta Sharing, Delta Live Tables (DLT)Strong understanding of the software development lifecycle (SDLC), especially Agile methodologiesStrong knowledge of SDLC tools and technologies Azure DevOps and GitHub, including project management software (Jira, Azure Boards or similar), source code management (GitHub, Azure Repos or similar), CI/CD system (GitHub actions, Azure Pipelines, Jenkins or similar) and binary repository manager (Azure Artifacts or similar)Strong understanding of DevOps principles, including continuous integration, continuous delivery (CI/CD), infrastructure as code (IaC – Terraform, ARM including hands-on experience), configuration management, automated testing, performance tuning and cost management and optimizationStrong knowledge in cloud computing specifically in Microsoft Azure services related to data and analytics, such as Azure Data Factory, Azure Databricks, Azure Synapse Analytics, Azure Data Lake, Azure Stream Analytics, SQL Server, Azure Blob Storage, Azure Data Lake Storage, Azure SQL Database, etcExperience in Orchestration using technologies like Databricks workflows and Apache AirflowStrong knowledge of data structures and algorithms and good software engineering practicesProven experience migrating from Azure Synapse to Azure Data Lake, or other technologiesStrong analytical skills to identify and address technical issues, performance bottlenecks, and system failuresProficiency in debugging and troubleshooting issues in complex data and analytics environments and pipelinesGood understanding of Data Quality and Governance, including implementation of data quality checks and monitoring processes to ensure that data is accurate, complete, and consistentStrong written and verbal communication skills to collaborate and articulate complex situations concisely with cross-functional teams, including business users, data architects, DevOps engineers, data analysts, data scientists, developers, and operations teamsAbility to document processes, procedures, and deployment configurationsUnderstanding of security practices, including network security groups, Azure Active Directory, encryption, and compliance standardsAbility to implement security controls and best practices within data and analytics solutions, including proficient knowledge and working experience on various cloud security vulnerabilities and ways to mitigate themSelf-motivated with the ability to work well in a team, and experienced in mentoring and coaching different members of the teamA willingness to stay updated with the latest services, Data Engineering trends, and best practices in the fieldComfortable with picking up new technologies independently and working in a rapidly changing environment with ambiguous requirementsCare about architecture, observability, testing, and building reliable infrastructure and data pipelinesMicrosoft Exam: Designing and Implementing Microsoft DevOps Solutions (nice to have)Experience with BI solutions including PowerBI is a plusCompany OverviewFusemachines is an enterprise AI services and solutions provider that brings AI education, products, and jobs to underserved communities. It was founded in 2013, and is headquartered in New York, New York, USA, with a workforce of 201-500 employees. Its website is http://fusemachines.com/.

Apply Now →

Similar Jobs

Experienced Registered Behavior Technician for In-Home ABA Therapy - Atlanta, GA

Remote

Immediate Hiring: Experienced Registered Behavioral Technician (RBT) for Clinic-Based ABA Therapy Services

Remote

Experienced Registered Behavioral Technician (RBT) - ABA Therapy for Children with Autism Spectrum Disorder

Remote

Experienced Registered Nurse - Telehealth: Providing Remote Care Coordination and Patient Support

Remote

Experienced Substitute Teacher for Riverside County Schools - Join Scoot Education's Innovative Team

Remote

Experienced Substitute Teacher for San Bernardino County - Flexible Schedules & Competitive Pay

Remote

Experienced School Year Instructional Coach for High-Dosage Tutoring Programs in Edgewater Park, NJ

Remote

Experienced School Year Tutor for K-8 Students in Math and Literacy - Mickleton, NJ

Remote

Experienced Secondary Social Studies Teacher for Kansas - Flexible Hybrid Remote Arrangement

Remote

USPS Office Helper

Remote

Staff User Experience Researcher

Remote

Associate Counsel - Boston, MA (Remote)

Remote

Remote Accounting Tutor

Remote

Support Engineer IV, Buy with Prime

Remote

Behavioral Health Specialist Fellow-Bilingual Spanish (MSW) (LSW)

Remote

Ignite Learning Academy - Online Teacher, Special Education - 2026/2027

Remote

Experienced Data Entry Specialist for blithequark - Part-Time Remote Opportunity with Flexible Hours

Remote

Experienced Customer Service Representative – Remote Work Opportunity with blithequark – Delivering Exceptional Support to Diverse Customer Base

Remote

Harbor Management – Property Accountant – Swampscott, MA

Remote

Job At Home CVS Pharmacy $27/Hr

Remote
← Back