[Remote] Principal Software Engineer – Backend
Note: The job is a remote job and is open to candidates in USA. DeepHow is a Physical AI platform serving industrial manufacturing, pharmaceuticals, and utilities. They are seeking an experienced Principal Software Engineer – Backend to lead the architecture, development, and optimization of their backend systems, focusing on building scalable, cloud-native SaaS platforms. The role involves mentoring engineers and delivering high-quality solutions that support AI-powered products and large-scale data-intensive applications.ResponsibilitiesLead the architecture, design, development, and optimization of scalable, high-performance backend systems that support business growth and product innovationDefine technical roadmaps, architectural standards, and engineering best practices while providing technical leadership and mentorship to development teamsDevelop and maintain backend applications, APIs, microservices, and automation solutions using Node.js and PythonDesign, deploy, and manage cloud-native infrastructure on Google Cloud Platform (GCP), including BigQuery, Cloud Run, Cloud Functions, App Engine, Compute Engine, and Google Kubernetes Engine (GKE)Implement and manage Infrastructure as Code (IaC) using Terraform and Helm to ensure scalable and repeatable deploymentsBuild and maintain observability frameworks, including monitoring, logging, tracing, and alerting using tools such as Datadog, New Relic, and Google Cloud MonitoringMonitor and optimize production machine learning workloads, including model performance, operational health, and data drift detectionDesign and manage scalable data architectures using PostgreSQL, MongoDB, Redis, and Firestore, while developing large-scale data pipelines and supporting dataset versioning practices with tools such as DVC and LakeFSDeploy, manage, and optimize containerized applications using Docker and Kubernetes (GKE), including multi-tenant architectures, RBAC, namespace isolation, and resource managementDesign secure cloud networking solutions involving VPCs, load balancers, and network security controls while implementing secure authentication and authorization using OAuth and SAMLEstablish and maintain infrastructure security best practices, including encryption, secrets management, service account governance, and credential rotationBuild and enhance CI/CD pipelines using Jenkins and support GitOps workflows with tools such as ArgoCD and FluxImprove application performance, scalability, reliability, and fault tolerance while implementing asynchronous processing frameworks such as Temporal and CeleryIntegrate ML frameworks, model lifecycle tools, and model-serving platforms, including PyTorch, Ray, Hugging Face, MLflow, Weights & Biases, BentoML, Triton, and TorchServe, within scalable Kubernetes environmentsSkillsBachelor's or Master's degree in Computer Science, Engineering, or a related technical disciplineEquivalent practical experience will also be considered10+ years of backend software engineering experienceProven track record of designing, building, and scaling production-grade systemsPrior experience in a SaaS company is requiredStrong experience in cloud-native environments and distributed systemsPrevious experience in a Principal Engineer, Staff Engineer, or Senior Lead Engineer role with ownership of architecture and system designDemonstrated success leading complex technical initiatives and mentoring engineering teamsExperience working in startup or high-growth environments is strongly preferredCompany OverviewDeepHow develops an AI-powered learning platform for manufacturing, service, and repair. It was founded in 2018, and is headquartered in Royal Oak, Michigan, USA, with a workforce of 51-200 employees. Its website is https://www.deephow.com.