Diverse Lynx is seeking a Data Engineer to design and implement frameworks for automated incident detection and remediation. The role involves building DataOps automation frameworks, enhancing platform observability, and creating self-service operational tools for engineering teams.
Responsibilities:
- Automated Incident Detection & Remediation
- Design and implement automated incident detection and remediation frameworks for data platforms and pipelines
- Shift incident response from reactive manual processes to proactive, policy-driven workflows
- Improve mean time to detect (MTTD) and mean time to remediate (MTTR) through automation
- DataOps Automation & AIOps Frameworks
- Build and operationalize DataOps automation frameworks, incorporating AIOps where applicable for anomaly detection and predictive alerting
- Reduce operational noise and repetitive manual tasks through intelligent automation
- Establish standardized operational runbooks and automated recovery patterns
- Platform Observability & Telemetry Standardization
- Define and implement platform-wide observability and telemetry standards for data workloads
- Enable consistent metrics, logging, alerting, and health signals across platforms and pipelines
- Ensure operational visibility is actionable and supports automated response
- Self-Service Operational Tooling
- Build self-service operational tooling that enables engineering teams to debug, monitor, and resolve issues independently
- Reduce operational dependencies on centralized teams and vendors
- Empower teams with standardized dashboards, insights, and tooling
- Automated Cost Governance & Optimization
- Establish automated cost governance and optimization frameworks for data platforms
- Improve cost visibility through telemetry, alerts, and policy-based controls
- Drive proactive cost management rather than reactive spend reviews
Requirements:
- Strong experience in Data Operations, Data Platform Operations, or SRE/DataOps roles
- Proven experience reducing operational toil through automation-first approaches
- Hands-on experience with incident management, observability, and operational tooling
- Experience operating within or transforming managed services or vendor-led operating models
- Strong understanding of cost governance and optimization in data platforms
- Hands-on experience with Microsoft Azure data platform services such as Azure Data Factory, Azure Databricks, Azure Synapse Analytics, Microsoft Fabric
- Experience implementing CI/CD and release automation using Azure DevOps, including Azure Pipelines, repository management, deployment workflows, and environment-based promotion controls
- Experience with Microsoft Entra ID, Azure Key Vault, and Microsoft Purview for identity, secrets management, governance, data lineage, and compliance controls
- Strong scripting and engineering skills in Python, SQL, and PowerShell to build automation, operational runbooks, telemetry pipelines, and remediation workflows
- 5+ years of relevant experience