Data Platform & Visualization Engineer(ML exp must)_Los Altos, CA (Hybrid/Remote)

Remote Full-time
Position: Data Platform & Visualization Engineer

Location: Los Altos, CA (Hybrid/Remote)

Duration: Contract

Job Description:

We are seeking a contractor to help build and evolve our internal data platform that supports vehicle testing, experimentation, and machine learning workflows.

This role focuses on implementing and extending data ingestion pipelines, automated processing workflows, metrics tracking systems, and web-based visualization tools under the guidance of the team.

You will work with existing systems and well-defined components, contributing features and improvements that are used directly by researchers.

What You’ll Do

- Implement and extend data ingestion and processing workflows for large, heterogeneous datasets collected from vehicle tests and ML pipelines.

- Contribute to improving orchestration, scheduling, and reliability of long-running data workflows operating under real-world constraints.

- Integrate downstream automation such as metric computation, plotting, and LLM-based postprocess tooling.

- Implement backend services and APIs that support data indexing, metadata management, and experiment tracking.

- Build user-facing web-based tools and dashboard that allow users to browse datasets, inspect results, and understand experimental progress over time.

- Work with a SQL-backed database to store metrics, experiment metadata, and summaries, ensuring the data can be queried and accessed consistently across systems.

- Contribute to data traceability and provenance mechanisms that capture how datasets are generated, transformed, and consumed in ML workflows.

What We’re Looking For

- Experience with Python for backend services, data pipelines, and automation.

- Working knowledge of SQL, including writing queries and understanding database schemas.

- Experience building web-based tools, including:

- Backend APIs (e.g., FastAPI, Flask, or similar)

- Frontend applications using React or other modern frameworks

- Familiarity with AWS and cloud-based storage or services.

- Comfortable working in Linux environments

Bonus Points

- Interest in autonomous racing and vehicle dynamics research.

- Prior internship or project experience involving data pipelines, dashboards, or analytics tools.

- Exposure to data visualization libraries, ML workflows, or experiment tracking systems.

Statement of Work
β€’ Scope of Work

The Contractor will provide engineering services to support the development and extension of internal data platform tooling supporting vehicle testing, experimentation, and machine learning workflows.

The scope includes ownership and extension of existing systems, implementation of automated pipelines, development of web-based visualization tools, and delivery of data traceability mechanisms.
β€’ Key Responsibilities

2.1 Data Ingestion Platform (pokedex / evdc_ingest)
β€’ Own and extend an existing data ingestion system responsible for uploading vehicle test data to

Amazon S3.
β€’ Improve ingestion orchestration to support: β—‹ Upload prioritization for small datasets

β—‹ Deferred upload scheduling for large datasets during off-hours

β—‹ Automatic discarding of data explicitly marked as trash

β—‹ Persistent queueing and resumability across server restarts or failures
β€’ Maintain ingestion reliability under constrained network bandwidth.
β€’ Extend the current web interface for clarity, reliability and extendability

System-level architecture decisions will be guided by the team.

2.2 Post-Ingestion Automation, Annotation and Storage
β€’ Integrate ingestion workflows with post-processor, such as: β—‹ Existing LLM-based automatic

annotation module

β—‹ Automating plot generation (You come back to automatically generated plots as soon as data hits S3 - imagine that!)

β—‹ Metric computation pipelines
β€’ Package and deploy the annotation system as a service (e.g., EC2-based).
β€’ Implement orchestration logic to trigger annotation jobs opportunistically when ingestion resources are idle.
β€’ Store metrics, experiment metadata, plots and summaries in SQL-backed database layer.

2.3 Metrics Platform & Leaderboards
β€’ Implement and extend a SQL-backed metrics database using schemas defined by the team.
β€’ Define schemas to support:

β—‹ Multiple projects

β—‹ Baselines vs experimental runs

β—‹ Historical comparisons
β€’ Build automated pipelines to compute and register metrics after ingestion.
β€’ Implement project-level leaderboard functionality to track: β—‹ Best performance per metric

β—‹ Accepted baselines vs rejected experiments
β€’ Develop a web-based visualization interface to: β—‹ Display time-series progress

β—‹ Visualize metric tradeoffs

β—‹ Summarize experimental outcomes

2.4 Data Traceability & Provenance
β€’ Design and implement a data provenance system for ML datasets.
β€’ Track: β—‹ Source S3 URIs

β—‹ Post-processing operations applied to datasets
β€’ Implement a registry of post-processing functions with support for: β—‹ Easy addition and removal

β—‹ Versioning and configuration tracking
β€’ Generate human-read

Apply tot his job

Apply To this Job
Apply Now β†’

Similar Jobs

Experienced Registered Behavior Technician for In-Home ABA Therapy - Atlanta, GA

Remote

Immediate Hiring: Experienced Registered Behavioral Technician (RBT) for Clinic-Based ABA Therapy Services

Remote

Experienced Registered Behavioral Technician (RBT) - ABA Therapy for Children with Autism Spectrum Disorder

Remote

Experienced Registered Nurse - Telehealth: Providing Remote Care Coordination and Patient Support

Remote

Experienced Substitute Teacher for Riverside County Schools - Join Scoot Education's Innovative Team

Remote

Experienced Substitute Teacher for San Bernardino County - Flexible Schedules & Competitive Pay

Remote

Experienced School Year Instructional Coach for High-Dosage Tutoring Programs in Edgewater Park, NJ

Remote

Experienced School Year Tutor for K-8 Students in Math and Literacy - Mickleton, NJ

Remote

Experienced Secondary Social Studies Teacher for Kansas - Flexible Hybrid Remote Arrangement

Remote

USPS Office Helper

Remote

Entry Level Data Entry Specialist – Remote Work Opportunity for Career Growth and Development at blithequark

Remote

[Remote] Analytics Consultant - USC only

Remote

Customer Service / Technical Support Rep (Hybrid)

Remote

Outside Sales Territory Manager - Northern Pittsburgh Region

Remote

Compliance Sr. Analyst- Supply Chain Master Data Compliance

Remote

Director, Content Marketing / Communications

Remote

Sales Development Representative (Pakistan)

Remote

Experienced Remote Customer Service Representative – Delivering Exceptional Support and Driving Customer Satisfaction at blithequark

Remote

IT Delivery Lead - EA Team, Inc.

Remote

Sr Regulatory Medical Writer (Mexico/Brazil/Argentina)

Remote
← Back