[Remote] Decision Intelligence Engineer - Next Best Action
Note: The job is a remote job and is open to candidates in USA. Humana Inc. is a leading U.S. healthcare company that is seeking a skilled Decision Intelligence Engineer to enhance their Next Best Action platform. The role involves designing and evaluating decision-making algorithms, ensuring system compliance with clinical rules, and collaborating with data and platform engineers.ResponsibilitiesDesign, implement, and evaluate algorithms suited to long-horizon, sparse-reward sequential decision-making in healthcare. These algorithms include reinforcement learning methods, such as PPO, A3C, DQN, CQL, and Decision Transformer, as well as dynamic programming formulations and constrained optimization approachesFrame member decisioning problems as Markov Decision Processes (MDPs) or Partially Observable MDPs, defining state representations, action spaces, transition dynamics, and reward structures that encode clinical and program-specific goalsApply Bellman-equation-based value estimation, reward shaping, and constraint formulations to encode clinical eligibility, suppression rules, and program-specific objectives directly into the learning or optimization objectiveManage exploration-exploitation tradeoffs (or equivalent uncertainty-handling in simulation and stochastic optimization) appropriate for a production healthcare environment where suboptimal actions have member impactModel member journey dynamics using tools from stochastic processes, simulation, or probabilistic graphical models to inform policy design and evaluateBuild simulation and backtesting environments, including discrete-event simulation and Monte Carlo methods, to evaluate policy or decision quality before production promotion using historical member journey dataDiagnose failure modes specific to learned or optimized policies. These include policy collapse, credit assignment errors across long member journeys, distributional shift between training and serving populations, and constraint violations under out-of-distribution inputs. Remediate these failure modesDefine performance threshold criteria and automated evaluation gates within the nightly Databricks training workflow; block promotion of underperforming policies to MLflow productionInstrument training and optimization runs with MLflow tracking covering hyperparameters, objective curves, action distributions, and feature importance for every training cycleOwn the nightly Databricks training workflow. This workflow involves feature engineering from upstream clinical and operational data sources, and state vector normalization. Additionally, it includes distributed training by Ray RLlib (or equivalent optimization solvers), and batch scoring of all eligible membersCollaborate with the Data Engineering team to ensure the Data Engineering team correctly joins training inputs, computes reward signals from disposition outcomes, and makes the feature pipeline reproducible and auditableWrite production-quality PySpark feature engineering jobs; maintain data lineage through Databricks Unity CatalogManage model artifacts, versioning, and lifecycle in the MLflow Model Registry; ensure rollback capability is maintained at all timesApply multi-agent decision-making concepts (MARL via PettingZoo, or game-theoretic or cooperative optimization approaches) where member household or population-level coordination is requiredImplement constraint handling to enforce hard business rules directly within the optimization objective. These rules include member caps, cooldown periods, and clinical eligibility. To achieve this, use constrained MDP formulations, Lagrangian relaxation, or mixed-integer programming as appropriate, rather than relying on downstream filtersCollaborate with rules engine stakeholders to ensure eligibility guards and policy priorities are correctly aligned and do not conflictPartner with decision engine and rules engine teams to ensure that you integrate model outputs cleanly with the real-time decisioning hot path and that you correctly structure and interpret scored recommendationsCollaborate with platform architects to define feedback loop contracts: how disposition outcomes flow back through the data pipeline into the next training cycleDocument model behavior, known limitations, and failure modes for clinical and compliance stakeholders; support explainability requirements for member-facing decisionsUse AI-assisted engineering tools for scaffolding, testing, and documentation; ensure all core model logic and objective design remain human-authored and subject to rigorous peer reviewSkills8+ years of software engineering or quantitative research experience building and operating large-scale production systems, with emphasis on data-intensive platforms, recommendation systems, optimization engines, or simulation frameworks serving millions of users3+ years of hands-on experience implementing reinforcement learning, operations research methods, or simulation-driven decision systems in production. Relevant backgrounds include policy gradient and value-based RL (PPO, A3C, DQN, CQL), stochastic dynamic programming, discrete-event simulation, or large-scale combinatorial or constrained optimizationDeep familiarity with Markov Decision Processes, Bellman-equation-based value estimation, reward or objective shaping, exploration-exploitation tradeoffs, and constraint formulation in real-world decision systemsDemonstrated ability to diagnose failure modes in learned or optimized policies: instability, poor credit assignment across long horizons, and distributional shift across large populationsProficiency in Python 3.x; experience with PyTorch or TensorFlow for policy network or learned model implementationExperience with Ray RLlib or equivalent distributed computation frameworks for large-scale training or optimizationExperience with Databricks, PySpark, and Delta Lake for large-scale ML or data pipelines processing tens of millions of recordsExperience with MLflow for experiment tracking, model registry, and artifact managementExperience with shipping systems that operate reliably under production load, not just research or prototype workExperience with multi-agent RL frameworks (PettingZoo or equivalent) or multi-agent simulation and coordination methodsFamiliarity with operations research methods applicable to constrained sequential decisioning: linear programming, mixed-integer programming, Lagrangian relaxation, or constraint programmingExperience operating decision or optimization systems in regulated domains (healthcare, finance, or insurance) where member safety, auditability, and explainability are requirementsExperience building simulation environments using Gymnasium, SimPy, AnyLogic, or equivalent frameworks for policy evaluation and backtestingFamiliarity with event-driven feedback loops and how disposition signals feed retraining or re-optimization pipelinesOpenTelemetry instrumentation experience for ML or optimization pipeline observabilityBenefitsThis job is eligible for a bonus incentive plan. This incentive opportunity is based upon company and/or individual performance.Humana, Inc. and its affiliated subsidiaries (collectively, 7Humana8) offers competitive benefits that support whole-person well-being.Medical, dental and vision benefits401(k) retirement savings planTime off (including paid time off, company and personal holidays, volunteer time off, paid parental and caregiver leave)Short-term and long-term disabilityLife insuranceEmployees who live and work from Home in the state of California, Illinois, Montana, or South Dakota will be provided a bi-weekly payment for their internet expense.Humana will provide Home or Hybrid Home/Office employees with telephone equipment appropriate to meet the business requirements for their position/[job.Work](http://job.Work) from a dedicated space lacking ongoing interruptions to protect member PHI / HIPAA information.Company OverviewHumana is a health insurance provider for individuals, families, and businesses. It was founded in 1964, and is headquartered in Louisville, Kentucky, USA, with a workforce of 10001+ employees. Its website is http://www.humana.com.