Applied Computing is focused on building Orbital, a physics-informed foundation model for energy operations. The Reinforcement Learning Researcher will own the development of learning-based optimization systems and ensure safe and efficient policy learning frameworks for industrial processes.
Responsibilities:
- Design & Implement RL-Based Decision Systems
- Process optimisation (yield, efficiency, cost reduction)
- Control policy learning (setpoint optimisation, constraint handling)
- Sequential decision-making under uncertainty
- Work across:
- Model-free RL (policy gradients, actor-critic, offline RL)
- Model-based RL (world models, planning-based methods)
- Hybrid approaches combining RL with optimisation / MPC
- Build Physics-Constrained RL Systems
- Embed domain knowledge into policy learning:
- Hard constraints (safety, operating limits, regulatory bounds)
- Soft constraints (efficiency, degradation, economic trade-offs)
- Physics-informed reward shaping and transition models
- Ensure policies:
- Respect physical feasibility
- Generalise across operating regimes
- Remain stable under real-world disturbances
- Offline RL, Simulation & Digital Twin Integration
- Develop RL systems that work in data-scarce and risk-sensitive environments:
- Offline RL from historical plant data
- Simulation-based training via digital twins
- Sim-to-real transfer strategies
- Handle:
- Distribution shift
- Partial observability
- Sparse / delayed rewards
- Safety, Robustness & Interpretability
- Design safe RL systems for production environments:
- Constrained RL / safe exploration
- Policy validation before deployment
- Fail-safe mechanisms and fallback strategies
- Ensure outputs are:
- Interpretable to engineers and operators
- Auditable and explainable
- Reliable under sensor faults and regime changes
- Production-Grade Deployment
- Deploy RL systems into real-world infrastructure:
- Containerised deployment (Docker, AWS / Azure)
- Integration with control systems (APC, DCS, advisory layers)
- Real-time inference and monitoring
- Build pipelines for:
- Continuous policy evaluation
- Safe rollout and rollback
- Online / batch policy updates
- Benchmarking & Validation
- Define evaluation standards for RL systems:
- Offline policy evaluation
- Counterfactual analysis
- Comparison vs MPC, heuristics, and operator baselines
- Ensure:
- Measurable economic impact
- Reproducible results
- Defensible performance claims