Dice is seeking a Senior Data Engineer with expertise in Databricks. The role involves responding to alerts, debugging data pipelines, and improving platform stability while coordinating with engineering teams during incidents.
Responsibilities:
- Respond promptly to PagerDuty and Datadog alerts, triaging incidents efficiently
- Debug and resolve failures across streaming and batch data pipelines
- Troubleshoot Databricks/Spark job failures, Kafka lag/connectivity issues, Delta Lake checkpoint failures, PostgreSQL sink issues, and schema-related failures
- Restart jobs, apply configuration fixes, and escalate issues with detailed root cause analysis as needed
- Execute operational runbooks and maintain thorough incident documentation and postmortems
- Improve platform stability by reducing recurring incidents and alert noise
- Coordinate effectively with downstream consumers and engineering teams during incidents
Requirements:
- Strong hands-on experience with Databricks, Spark/PySpark, Kafka, Delta Lake, Python, SQL, and PostgreSQL
- Experience with AWS services including S3, IAM, Secrets Manager, and KMS
- Knowledge of Datadog, PagerDuty, and production observability practices
- Strong troubleshooting and debugging skills across distributed systems and streaming pipelines
- Ability to distinguish between transient infrastructure issues, configuration fixes, and application/code defects
- Experience with Apache Flink and stateful streaming systems
- Healthcare or HIPAA domain exposure
- Experience with dbt, Great Expectations, or data quality frameworks
- Terraform-managed cloud infrastructure experience