NTT DATA North America is a leader in business and technology services, committed to innovation and client success. They are seeking a Data Engineer - Security to build and operate AWS data lakes, design Glue jobs, and manage data orchestration using Airflow and Snowflake.
Responsibilities:
- API-first data ingestion. Strong hands-on pulling data from REST/GraphQL APIs with auth (OAuth2, API keys), pagination, rate limits, retries/backoff, and webhooks; strong Python skills to normalize/enrich data and land it cleanly into S3 (schema, partitioning, Parquet)
- AWS data lake, end to end. Comfortable building/operating S3-based lakes with layered zones (raw → harmonized → conformed → modeled), Glue Data Catalog, IAM/Secrets Manager, VPC endpoints, encryption, lifecycle/versioning, and cost/perf best practices (file sizing, compaction)
- AWS Glue + PySpark expert. Designs and optimizes Glue jobs using PySpark/DynamicFrames, bookmarks for incremental loads, dependency packaging, robust error handling, logging/metrics, and unit tests; knows how to tune jobs for scale and cost
- Airflow orchestration. Writes clean, parameterized, idempotent DAGs (sensors, SLAs, retries, alerts), manages dependencies across pipelines, and uses Git-based CI/CD to promote changes safely
- Snowflake proficiency. Builds ELT models (staging/ODS/marts), tunes performance (warehouse sizing, clustering, micro-partitions, caching), uses Streams/Tasks/Snowpipe for CDC
Requirements:
- Strong hands-on pulling data from REST/GraphQL APIs with auth (OAuth2, API keys), pagination, rate limits, retries/backoff, and webhooks
- Strong Python skills to normalize/enrich data and land it cleanly into S3 (schema, partitioning, Parquet)
- Comfortable building/operating S3-based lakes with layered zones (raw → harmonized → conformed → modeled)
- Glue Data Catalog, IAM/Secrets Manager, VPC endpoints, encryption, lifecycle/versioning, and cost/perf best practices (file sizing, compaction)
- Designs and optimizes Glue jobs using PySpark/DynamicFrames, bookmarks for incremental loads, dependency packaging, robust error handling, logging/metrics, and unit tests
- Knows how to tune jobs for scale and cost
- Writes clean, parameterized, idempotent DAGs (sensors, SLAs, retries, alerts)
- Manages dependencies across pipelines, and uses Git-based CI/CD to promote changes safely
- Builds ELT models (staging/ODS/marts)
- Tunes performance (warehouse sizing, clustering, micro-partitions, caching)
- Uses Streams/Tasks/Snowpipe for CDC