Analyze cluster, queue, workload, and license telemetry to identify optimization opportunities across throughput, fairness, efficiency, and reliability
Drive scheduler and policy tuning for job placement, queue design, fairshare behavior, right-sizing, and license-aware execution
Profile critical EDA and HPC workloads to analyze CPU utilization, GPU acceleration characteristics, memory and I/O behavior, runtime performance, and scaling efficiency, and translate findings into actionable tuning and optimization recommendations
Partner with Engineering Enablement, IT, Cloud, workflow owners, and developers to resolve systemic issues and prevent repeat incidents
Develop dashboards, KPIs, and data-driven reporting for queue health, Compute/License utilization, capacity forecasting, and performance and cost optimization outcomes
Apply automation and AI/ML-driven approaches where appropriate for anomaly detection, demand forecasting, and proactive congestion avoidance
Document best practices, operational playbooks, and repeatable methods to improve resilience and reduce reliance on a small set of experts.
Requirements
Bachelor’s or Master’s degree in Computer Science, Electrical Engineering, Electronics, Data Engineering, or a related technical field
Strong experience in HPC environments, batch schedulers (LSF/Slurm/PBS etc), EDA compute workflows, GPU Enabled Infrastructure or large-scale infrastructure operations
Hands-on knowledge of Linux systems, scripting/automation, telemetry analysis, and troubleshooting of distributed compute environments
Experience with workload profiling, performance analysis, or resource optimization for compute and license-constrained environments
Experience driving GPU porting and enablement including GPU-aware scheduler configuration and resource allocation policies
Ability to translate operational data into recommendations, influence cross-functional stakeholders, and drive actions to closure
Strong communication skills with the ability to work effectively across engineering, support, workflow, and infrastructure teams.
Tech Stack
Cloud
Linux
Benefits
supportive environment focused on professional growth