[Remote] AI Solutions Architect - FS or CI Polygraph Required
Note: The job is a remote job and is open to candidates in USA. Cloudera is a leading data partner for top companies across various industries, focusing on transforming complex data into actionable insights. They are seeking an AI Solutions Engineer to lead the technical architecture and execution for government organizations, helping them design and deploy mission-critical AI applications on the Cloudera Data Platform.ResponsibilitiesEvaluate and select optimal model architectures (LLMs, SLMs, or traditional ML) based on mission requirements, considering tradeoffs between accuracy, latency, and costGuide customers on "Build vs. Buy vs. Fine-tune" decisions, prioritizing open-source models (Llama, Mistral, Falcon) that can run securely within a sovereign data perimeterExperience building Agentic Workflows (AI agents that can execute API calls and multi-step tasks)Design and implement robust data pipelines within CDP to transform "messy" legacy data into AI-ready formatsDevelop and optimize Vector Databases and Retrieval-Augmented Generation (RAG) architectures to ground AI responses in verified agency factsBuild Data pipelines with Spark, Nifi, Kafka or other ETL toolsOptimize model inference for production environments using quantization, pruning, and hardware acceleration (NVIDIA GPU orchestration)Implement LLMOps to monitor model performance, detect hallucination rates, and manage model versioning and driftCollaborate with the customer’s AI Center of Excellence (CoE) to establish automated guardrails for ethics, bias mitigation, and FedRAMP/IL5 complianceTranslate complex technical AI concepts into mission-value briefings for GS-level stakeholders and agency leadershipSkillsExperience: 5+ years in Data Engineering, Machine Learning, or Software Engineering, with at least 2 years focused on Generative AI or Deep LearningTechnical Stack: Expertise in Python and deep learning frameworks (PyTorch, TensorFlow, Hugging Face)Hands-on experience with Cloudera (CDP), Spark, or similar big data ecosystemsProficiency in orchestration tools like LangChain, LlamaIndex, or HaystackExperience developing visual data representations and dashboards (Django, React, or Angular)Experience using a compiled programming language, preferably one that runs on the JVM (Java, Scala, etc)Data Expertise: Proven ability to build ETL/ELT pipelines and work with both SQL and NoSQL/Vector databases (e.g., Pinecone, Milvus, or PGVector)Public Sector Knowledge: Understanding of government security frameworks (NIST AI RMF, FedRAMP, SRGs, STIGs)Active Top Secret Security ClearanceExperience fine-tuning of foundational models using techniques such as PEFT (Parameter-Efficient Fine-Tuning) and LoRA to adapt AI to domain-specific government nomenclatureExperience training of specialized models on proprietary datasets while ensuring strict adherence to data privacy and sensitivity labelsExperience installing and operating Cloudera Data PlatformExperience installing and operating KubernetesExperience in Air-Gapped deployments and managing AI workloads in disconnected environmentsAdvanced degree (MS or PhD) in Computer Science, Data Science, or a related fieldActive Counterintelligence (CI) or Full Scope (FS) Poly is requiredBenefitsGenerous PTO PolicySupport work life balance with [Unplugged Days](https://www.youtube.com/watch?v=eXBMXiUHG8c)Flexible WFH PolicyMental & Physical Wellness programsPhone and Internet Reimbursement programAccess to Continued Career DevelopmentComprehensive Benefits and Competitive Packages[Paid Volunteer Time](https://www.youtube.com/watch?v=EHPK_ZRVRHA)Employee Resource GroupsCompany OverviewCloudera is a software development company that offers data management and cloud-native data analytic solutions. It was founded in 2008, and is headquartered in Santa Clara, California, USA, with a workforce of 1001-5000 employees. Its website is http://www.cloudera.com.