[Remote] Staff Software Engineer , Anywhere Cloud - AI Systems & Runtimes
Note: The job is a remote job and is open to candidates in USA. Cloudera is a leading company in data management and cloud innovation, seeking a Staff Software Engineer to lead the architecture and delivery of their cloud-native AI platform. The role involves bridging AI research and production-grade Kubernetes environments while optimizing the management of open-source models and designing integration patterns for seamless AI capabilities.ResponsibilitiesDesign and implement elegant, scalable application services (Go/Node.js) that wrap AI capabilities for enterprise useLead the deployment of inference servers (vLLM, Triton) using KServe, KubeRay, or Knative to ensure serverless-style scaling for AI workloadsBuild internal tooling, SDKs, and 'AI Gateways' that enhance team agility and simplify the integration of Foundation Models (Llama, GPT) into product featuresArchitect robust Retrieval-Augmented Generation (RAG) pipelines and prompt management services that integrate seamlessly with vector databases and enterprise data sourcesPartner with UI engineers, UX designers, and Product Management to ensure the AI platform is not just powerful, but highly usable for internal developersEnsure AI workloads are secure, multi-tenant, and optimized for GPU resource scheduling (MIG, fractional GPUs) within KubernetesSkillsBachelor's degree with 6+ years of software engineering experience (or equivalent Masters/PhD tenure), with at least 2+ years focused on AI/ML systemsExpert proficiency in Python (for AI ecosystem) and strong competence in a systems language like Go or Rust/C++ (for high-performance serving layers)Deep understanding of LLM deployment challenges and runtimes (e.g., vLLM, ONNX, TorchServe, Triton). Familiarity with quantization techniques (AWQ, GPTQ) to optimize model size/speedExperience building complex workflows using tools like LangChain or LlamaIndex, and deploying them on containerized infrastructure (Docker/Kubernetes)Ability to navigate the rapidly changing AI landscape, filtering hype from practical engineering solutions, and driving technical alignment across teamsModel Fine-Tuning: Experience with efficient fine-tuning techniques (PEFT, LoRA/QLoRA) on custom datasetsGPU Optimization: Familiarity with CUDA programming or profiling GPU performance (Nsight systems)Open Source: Contributions to open-source AI projects (HuggingFace transformers, vLLM, etc.)BenefitsGenerous PTO PolicySupport work life balance with [Unplugged Days](https://www.youtube.com/watch?v=eXBMXiUHG8c)Flexible WFH PolicyMental & Physical Wellness programsPhone and Internet Reimbursement programAccess to Continued Career DevelopmentComprehensive Benefits and Competitive Packages[Paid Volunteer Time](https://www.youtube.com/watch?v=EHPK_ZRVRHA)Employee Resource GroupsCompany OverviewCloudera is a software development company that offers data management and cloud-native data analytic solutions. It was founded in 2008, and is headquartered in Santa Clara, California, USA, with a workforce of 1001-5000 employees. Its website is http://www.cloudera.com.