Optum is a global organization that delivers care aided by technology to help millions of people live healthier lives. In this role, you will design, develop, test, and deploy data pipelines and architectures to support Advisory Board's data and analytics projects, ensuring data quality and collaborating with data scientists to optimize analytics infrastructure.
Responsibilities:
- Pipeline Development: Design, build, and maintain scalable, reliable, and efficient ETL/ELT pipelines using AWS Glue, Python, and SQL. Automate manual processes and optimize PySpark jobs for big data
- Data Lake/Warehouse Management: Architect and manage data lakes (AWS S3), data warehouses (Redshift, S3 Tables) and relational databases (PostgreSQL, SQL Server). Use data modeling best practices to ensure data is accurate, accessible, and organized for efficient reporting and analysis
- Cloud Infrastructure: Utilize AWS services like ECS and Lambda for data ingestion and orchestration tasks, involving a variety of external systems (Kafka, Snowflake, Databricks, custom API, etc.)
- Data Quality & Security: Implement monitoring and troubleshooting measures to ensure data integrity and security, including IAM policies and CloudWatch logging
- Collaboration: Work closely with data scientists and analysts to understand healthcare data and business requirements, and support data-driven decision-making
- Analytics Development: Build data sets and dashboards, configure RLS and user permissions, and optimize dashboard performance on Amazon Quick Suite. Configure parameters for interop with web app embedding
- CI/CD: Leverage GitHub for version control, code review, and automated deployment of pipelines across environments
Requirements:
- 8+ years of solid experience in data engineering using Python or PySpark
- 5+ years of experience with AWS including Glue, Lambda, IAM, Redshift
- 3+ years developing and optimizing PySpark jobs with proven big data experience
- 3+ years of best practices in data analysis and modeling
- Experience optimizing the architecture for big data, query performance, ease of use, and data governance
- Experience performing root cause analysis on internal and external data and processes to answer specific business questions and identify opportunities for improvement
- Experience working with Cloud technologies (AWS preferred)
- Proven solid expertise in SQL and data modelling with relational databases like SQL Server, Postgres etc
- Experience in Amazon Quick Suite developing datasets and dashboards with support for web app embeddings
- Understanding of health care claims data, including Medicare and commercial datasets
- Proven eagerness and willingness to learn new technologies
- Proven solid analytical, problem solving and decision-making skill
- Demonstrated depth of health care knowledge and expertise
- Proven written and oral communication skills