LMI is a digital solutions provider focused on enhancing government impact through innovation. They are seeking a Health DevOps Engineer with expertise in observability and reliability to support public health systems management, particularly in Medicare and related areas, ensuring the stability and performance of healthcare technology infrastructure.
Responsibilities:
- Design, implement, and maintain monitoring and alerting systems for production and development environments to ensure high availability and reliability
- Leverage tools like Prometheus, Grafana, DataDog, Elastic Stack, or equivalent to track system performance and application health
- Proactively detect and troubleshoot performance bottlenecks, infrastructure issues, and failures
- Optimize the performance and reliability of highly available systems supporting healthcare payment applications
- Lead incident response efforts, including root cause analysis, and implement measures to prevent recurrence
- Develop and maintain service level objectives (SLOs) and service level indicators (SLIs) to measure system reliability and availability
- Ensure the consistent delivery of high-quality health-related data in compliance with industry standards such as HIPAA
- Create and maintain automated deployment pipelines (CI/CD) to reduce release cycle times and improve workflows for developers
- Automate infrastructure provisioning and management through tools such as Terraform, Helm, Ansible, or CloudFormation
- Improve the operational efficiency of development and deployment processes
- Deploy, maintain, and scale cloud infrastructure on platforms such as AWS, Azure, or Google Cloud, ensuring compliance with healthcare-sector security and privacy requirements
- Implement and manage container orchestration platforms such as Kubernetes and Docker to ensure efficient resource usage and scalability
- Optimize cloud resources for cost efficiency and streamline infrastructure provisioning
- Partner with development and product teams to ensure seamless integration of observability tools and reliability practices through every stage of the software delivery lifecycle
- Document architecture, processes, metrics, and troubleshooting guides to support scalability and knowledge sharing across the organization
- Actively contribute to improving engineering workflows, reliability processes, and operational excellence
Requirements:
- Bachelor's degree in Computer Science, Software Engineering, or a related field
- Minimum of 2 years of professional experience in DevOps, Site Reliability Engineering (SRE), or a related role, preferably focused on observability and reliability
- Hands-on experience with monitoring tools such as Prometheus, Grafana, ELK stack, DataDog, New Relic, or similar platforms
- Experience with containerization and orchestration technologies like Docker and Kubernetes
- Proficiency with cloud platforms such as AWS
- Solid understanding of infrastructure as code (IaC) with tools like Terraform, Ansible, or CloudFormation
- Knowledge of scripting languages such as Python, Bash, or PowerShell for automation tasks
- Familiarity with CI/CD tools such as Jenkins, GitLab CI/CD, CircleCI, or similar frameworks
- Strong understanding of network protocols, monitoring, and troubleshooting best practices
- Strong problem-solving and analytical abilities, with extreme attention to detail and a commitment to reliability and excellence
- Excellent written and verbal communication skills, able to interact effectively with cross-functional teams
- Ability to work under pressure and prioritize tasks in fast-paced environments
- Experience with healthcare-focused projects or systems
- Familiarity with healthcare compliance requirements such as HIPAA
- Certifications such as AWS Certified DevOps Engineer, Azure DevOps Engineer Expert, or Linux Foundation Certified Kubernetes Administrator
- Experience in federal consulting
- Demonstrated experience with healthcare data projects (e.g. claims processing or payment systems) or financial/banking systems