[Remote] System Software Engineer - AI
Note: The job is a remote job and is open to candidates in USA. Delos-data is a stealth-mode startup focused on building foundational technology for large-scale AI data center clusters. They are seeking a talented System Software Engineer to design and implement communication and execution primitives for efficient AI model operations across thousands of GPUs.ResponsibilitiesCollaborate across the stack to influence the design of our foundational technology, ensuring it meets the needs of next-generation AI modelsIdentify and resolve performance bottlenecks in distributed training and inference workloads through deep-dive analysis of the software-hardware interfaceConduct rigorous performance benchmarking and characterization on multi-node clustersSkillsStrong proficiency in C++ and Python, with a deep understanding of systems programming fundamentals (memory management, concurrency, OS internals)Proficient in a Linux development environmentBachelor's or Master's degree in Computer Engineering, Computer Science, or a related fieldExperience with GPU programming (CUDA) and performance optimization for parallel architecturesFamiliarity with distributed AI frameworks (PyTorch, JAX, or DeepSpeed) and/or inference engines (vLLM, SGLang, Dynamo/TRT-LLM)Hands-on experience with large-scale cluster orchestration and telemetry toolsBenefitsMeaningful equityBenefits401kCompany OverviewWe are a mixture of software, system, and silicon experts using AI every day to deliver the world's most capable and responsive intelligence. It was founded in undefined, and is headquartered in , with a workforce of 2-10 employees. Its website is https://delosdata.com.