Sr. Software Engineer, Observability and Telemetry

Remote Full-time
About the position

Tenstorrent is leading the industry on cutting-edge AI technology, revolutionizing performance expectations, ease of use, and cost efficiency. With AI redefining the computing paradigm, solutions must evolve to unify innovations in software models, compilers, platforms, networking, and semiconductors. Our diverse team of technologists have developed a high performance RISC-V CPU from scratch, and share a passion for AI and a deep desire to build the best AI platform possible. We value collaboration, curiosity, and a commitment to solving hard problems. We are growing our team and looking for contributors of all seniorities.
Tenstorrent is building the world’s fastest, most efficient AI compute clusters. Our modular RISC-V and AI processors can snap together into a single, massively parallel distributed supercomputer consisting of thousands of compute nodes. As we scale, the volume and complexity of operational data grows by orders of magnitude. Observability and telemetry are key to ensuring our customers can resolve problems in minutes rather than hours. The telemetry team owns our proprietary telemetry infrastructure, spanning from the device level to the infrastructure needed to drive dashboards, monitoring systems, and orchestration.
This role is hybrid, based out of Santa Clara, CA; Austin, TX; or Toronto, ON.
We welcome candidates at various experience levels for this role. During the interview process, candidates will be assessed for the appropriate level, and offers will align with that level, which may differ from the one in this posting.

Responsibilities
• Architect, implement, and maintain TT-Telemetry, our C++-based service for collecting and exporting device-level metrics.
• Interface with internal engineering teams to build a deep understanding of Tenstorrent’s architecture and identify and surface useful metrics.
• Design efficient built-in web GUIs for observing device- and cluster-level state, diagnosing problems, and monitoring utilization.
• Design ingestion pipelines for industry standard telemetry systems (e.g., Prometheus).
• Help define the long-term architecture of Tenstorrent’s distributed telemetry stack.

Requirements
• Strong C++ engineer and comfortable working in both low-level environments and distributed systems design.
• Experience building atop observability platforms such as Prometheus, OpenTelemetry, Grafana, ClickHouse, or similar technologies.
• Solid understanding of data structures for manipulating large volumes of data.
• Familiarity with SQL databases, with time-series databases a plus.
• Curious about networking and communication across large clusters and comfortable reasoning from first principles while challenging industry conventions.

Benefits
• Tenstorrent offers a highly competitive compensation package and benefits, and we are an equal opportunity employer.

Apply tot his job

Apply To this Job
Apply Now →

Similar Jobs

Experienced Registered Behavior Technician for In-Home ABA Therapy - Atlanta, GA

Remote

Immediate Hiring: Experienced Registered Behavioral Technician (RBT) for Clinic-Based ABA Therapy Services

Remote

Experienced Registered Behavioral Technician (RBT) - ABA Therapy for Children with Autism Spectrum Disorder

Remote

Experienced Registered Nurse - Telehealth: Providing Remote Care Coordination and Patient Support

Remote

Experienced Substitute Teacher for Riverside County Schools - Join Scoot Education's Innovative Team

Remote

Experienced Substitute Teacher for San Bernardino County - Flexible Schedules & Competitive Pay

Remote

Experienced School Year Instructional Coach for High-Dosage Tutoring Programs in Edgewater Park, NJ

Remote

Experienced School Year Tutor for K-8 Students in Math and Literacy - Mickleton, NJ

Remote

Experienced Secondary Social Studies Teacher for Kansas - Flexible Hybrid Remote Arrangement

Remote

USPS Office Helper

Remote

Experienced Remote Data Entry Specialist for Live Chat Support and Customer Service Excellence at blithequark

Remote

Experienced Remote Data Entry Specialist – Flexible Work from Home Opportunity with Competitive Hourly Rate and Professional Growth

Remote

Remote - Senior Business Analyst, ServiceNow GRC/IRM (3 days/week onsite) Chicago, IL

Remote

Experienced Customer Chat Support Specialist for Dynamic Online Service Team – Remote Work Opportunity with Comprehensive Training and Flexible Scheduling

Remote

Benefits Planner

Remote

Logistics Accountant

Remote

Compliance Manager, Audit and Oversight

Remote

**Remote Customer Experience Champion - Full-Time Virtual Representative**

Remote

Experienced Customer Service Representative – Providing Exceptional Support and Driving Business Growth at blithequark

Remote

Manager, Volunteer Programs - AP

Remote
← Back