AzureCloudGrafanaPythonTerraformBashPowerShellAIAzure DevOpsDatadogCI/CDCommunicationCollaborationRemote Work
About this role
Role Overview
Own end-to-end monitoring and alerting for critical platform components using Datadog, Application Insights, and Grafana.
Define and track SLOs/SLIs and drive proactive incident prevention.
Triage and resolve day-to-day DevOps tickets through scripting and automation (PowerShell, Bash, Azure CLI) to reduce operational toil and accelerate engineering velocity.
Manage and evolve cloud infrastructure using Terraform.
Build reusable modules and enforce infrastructure best practices across test and production environments.
Lead incident response, conduct blameless post-mortems, and implement follow-up actions to prevent recurrence.
Partner with engineering teams to improve CI/CD pipelines in Azure DevOps, reducing deployment risk and streamlining delivery.
Support SOC-2 compliant infrastructure design, assist with security hardening, vulnerability management, and regulatory requirements.
Monitor industry trends — especially around AI-assisted operations — and evaluate new tools to continuously improve infrastructure reliability and performance.
Requirements
5+ years of experience in a DevOps or SRE role, supporting business-critical, highly available systems.
Strong proficiency in Terraform (IaC) for provisioning and managing cloud infrastructure at scale.
Hands-on experience with Azure Cloud
including computing, networking, storage, and managed services.
Deep expertise in monitoring, alerting, and observability tooling using Datadog, Azure Insights, Azure Application Insights, Grafana, or equivalent.
Proficiency in scripting languages such as PowerShell, Python, or Bash for automation and incident remediation.
Experience with Azure DevOps for CI/CD automation, including Repos, Pipelines, and Releases.
Willingness to participate in early-morning on-call and respond to production incidents as they arise.
Strong communication and collaboration skills, with the ability to thrive in a small, fast-moving team environment.
Tech Stack
Azure
Cloud
Grafana
Python
Terraform
Benefits
Competitive salary
Remote work options
Senior Site Reliability Engineer at NIR-YU | JobVerse