Staff Data Engineer

Remote Full-time
WHO WE ARE Zeta Global (NYSE: ZETA) is the AI-Powered Marketing Cloud that leverages advanced artificial intelligence (AI) and trillions of consumer signals to make it easier for marketers to acquire, grow, and retain customers more efficiently. Through the Zeta Marketing Platform (ZMP), our vision is to make sophisticated marketing simple by unifying identity, intelligence, and omnichannel activation into a single platform – powered by one of the industry’s largest proprietary databases and AI. Our enterprise customers across multiple verticals are empowered to personalize experiences with consumers at an individual level across every channel, delivering better results for marketing programs. Zeta was founded in 2007 by David A. Steinberg and John Sculley and is headquartered in New York City with offices around the world. To learn more, go to www.zetaglobal.com . The Opportunity We are looking for a Staff Data Engineer to lead the design and implementation of a unified semantic data layer that spans all of Zeta’s data sources—both data at rest and data in motion. This role sits at the intersection of data engineering, platform architecture, and AI enablement. You will be responsible for building a middleware semantic layer (using Cube Core or similar technologies) that exposes clean, governed, multi-tenant data via standardized APIs and tool interfaces, enabling AI agents and LLMs to query, reason over, and act on Zeta’s data with high performance, security, and compliance. This is a high-impact, high-visibility role that will shape how Zeta’s AI systems consume and interact with data across the organization. What You’ll Do Semantic Layer Architecture & Development Design and build a centralized semantic data layer using Cube Core (or equivalent technology such as Headless BI, dbt Metrics Layer, or Metriql) that provides a unified, governed abstraction over all company data sources. Define semantic models, metrics, dimensions, and relationships that map to business domains across marketing, advertising, identity resolution, and customer analytics. Expose the semantic layer via REST/GraphQL APIs and MCP-compatible tool interfaces purpose-built for consumption by AI agents and LLMs. Data Source Integration & Unification Integrate and unify data from heterogeneous systems including MySQL, DynamoDB, Aerospike, Snowflake, Amazon S3 (data lakes), Apache Kafka, Amazon SQS, and other internal data stores. Build connectors, adapters, and federation layers to query across both operational (OLTP) and analytical (OLAP) data sources in a performant, cost-efficient manner. Ensure seamless handling of both data at rest (warehouses, lakes, databases) and data in motion (streaming platforms, event buses, message queues). AI & LLM Enablement Design tool interfaces and API contracts that allow AI agents to discover available data, understand schema semantics, and generate accurate queries autonomously. Collaborate with AI/ML teams to optimize the semantic layer for LLM-generated SQL, natural language querying, retrieval-augmented generation (RAG), and agentic workflows. Implement guardrails, query validation, and cost controls to prevent runaway queries from AI-generated workloads. Multi-Tenancy, Security & Compliance Architect the semantic layer with native multi-tenant isolation, ensuring strict data segregation and tenant-scoped access controls. Implement row-level security, column-level masking, and attribute-based access controls (ABAC) to enforce data governance policies. Ensure compliance with SOC 2, GDPR, CCPA, and industry-specific regulations governing data access, PII handling, and cross-border data flows. Performance, Scalability & Reliability Design for horizontal scalability to support thousands of concurrent queries from AI agents, internal dashboards, and customer-facing products. Implement intelligent caching (pre-aggregation, materialized views, query result caching) to deliver sub-second response times for common query patterns. Build observability into the semantic layer with comprehensive metrics, logging, alerting, and query performance profiling. Technical Leadership & Collaboration Serve as the technical authority on data architecture decisions, authoring ADRs (Architecture Decision Records) and reference architectures. Mentor and guide senior engineers on best practices for semantic modeling, data governance, and API design. Partner cross-functionally with Product, Data Science, Platform Engineering, InfoSec, and Compliance teams to align the data layer with business objectives. What We’re Looking For Required Qualifications 10+ years of experience in data engineering, data architecture, or platform engineering, with at least 3 years operating at a Staff/Principal level. Deep hands-on expertise with multiple data stores: relational (MySQL/PostgreSQL), NoSQL (DynamoDB, Aerospike, MongoDB), cloud data warehouses (Snowflake, BigQuery, Redshift), and data lakes (S3, Delta Lake, Iceberg). Strong experience with streaming and messaging systems: Apache Kafka, Amazon SQS/SNS, Kinesis, or equivalent. Proven experience building or operating semantic/metrics layers using Cube.js/Cube Core, dbt Metrics, LookML, or similar technologies. Expert-level SQL skills and experience with query optimization across distributed systems. Production experience designing multi-tenant data platforms with strict security and isolation requirements. Strong understanding of data governance, access control models (RBAC, ABAC), and compliance frameworks (SOC 2, GDPR, CCPA). Experience designing and exposing APIs (REST, GraphQL) for data consumption at scale. BS/MS in Computer Science, Data Engineering, or equivalent practical experience. Preferred Qualifications Experience building data interfaces specifically for AI/ML consumption, including tool-use APIs for LLM agents, MCP (Model Context Protocol), or function-calling patterns. Familiarity with AI orchestration frameworks (LangChain, LlamaIndex, Semantic Kernel) and how they interact with external data tools. Experience with infrastructure-as-code (Terraform, Pulumi), container orchestration (Kubernetes, ECS), and CI/CD pipelines for data platform deployments. Background in MarTech/AdTech data domains: identity graphs, audience segmentation, campaign analytics, attribution modeling, or real-time bidding data. Contributions to open-source data tools or published thought leadership on semantic layers, data mesh, or AI-enabled data architectures. BENEFITS & PERKS Unlimited PTO Excellent medical, dental, and vision coverage Employee Equity Employee Discounts, Virtual Wellness Classes, and Pet Insurance And more!! SALARY RANGE The salary range for this role is $170,000 - $200,000, depending on location and experience. PEOPLE & CULTURE AT ZETA Zeta considers applicants for employment without regard to, and does not discriminate on the basis of an individual’s sex, race, color, religion, age, disability, status as a veteran, or national or ethnic origin; nor does Zeta discriminate on the basis of sexual orientation, gender identity or expression. We’re committed to building a workplace culture of trust and belonging, so everyone feels invited to bring their whole selves to work. We provide a forum for employees to celebrate, support and advocate for one another. Learn more about our commitment to diversity, equity and inclusion here: ZETA IN THE NEWS! #LI-YW1
Apply Now →

Similar Jobs

Experienced Registered Behavior Technician for In-Home ABA Therapy - Atlanta, GA

Remote

Immediate Hiring: Experienced Registered Behavioral Technician (RBT) for Clinic-Based ABA Therapy Services

Remote

Experienced Registered Behavioral Technician (RBT) - ABA Therapy for Children with Autism Spectrum Disorder

Remote

Experienced Registered Nurse - Telehealth: Providing Remote Care Coordination and Patient Support

Remote

Experienced Substitute Teacher for Riverside County Schools - Join Scoot Education's Innovative Team

Remote

Experienced Substitute Teacher for San Bernardino County - Flexible Schedules & Competitive Pay

Remote

Experienced School Year Instructional Coach for High-Dosage Tutoring Programs in Edgewater Park, NJ

Remote

Experienced School Year Tutor for K-8 Students in Math and Literacy - Mickleton, NJ

Remote

Experienced Secondary Social Studies Teacher for Kansas - Flexible Hybrid Remote Arrangement

Remote

USPS Office Helper

Remote

Japanese fluent Content Analyst (US-Remote)

Remote

Kelly Services Remote Sales Agent in Appleton, Wisconsin

Remote

000 Infor ERP Finance LN Consultant Remote

Remote

Lead Technical Product Manager, Networking and Monitoring

Remote

Customer Support Representative For blithequark's Innovative Online Course Platform

Remote

**Experienced Details Expert I - Remote Data Entry Job at blithequark**

Remote

Research Intern - Unpaid

Remote

VDC (BIM) Specialist

Remote

Director, IT/Chief Information Office (CIO)

Remote

Experienced Bank and Credit Union Customer Service Representative – Financial Support and Client Relationship Management

Remote
← Back