NTT DATA North America is a leader in business and technology services, committed to innovation and client success. They are seeking a Data Engineer - Security to build and operate AWS data lakes, design Glue jobs, and manage data orchestration using Airflow and Snowflake.

Responsibilities:

API-first data ingestion. Strong hands-on pulling data from REST/GraphQL APIs with auth (OAuth2, API keys), pagination, rate limits, retries/backoff, and webhooks; strong Python skills to normalize/enrich data and land it cleanly into S3 (schema, partitioning, Parquet)
AWS data lake, end to end. Comfortable building/operating S3-based lakes with layered zones (raw → harmonized → conformed → modeled), Glue Data Catalog, IAM/Secrets Manager, VPC endpoints, encryption, lifecycle/versioning, and cost/perf best practices (file sizing, compaction)
AWS Glue + PySpark expert. Designs and optimizes Glue jobs using PySpark/DynamicFrames, bookmarks for incremental loads, dependency packaging, robust error handling, logging/metrics, and unit tests; knows how to tune jobs for scale and cost
Airflow orchestration. Writes clean, parameterized, idempotent DAGs (sensors, SLAs, retries, alerts), manages dependencies across pipelines, and uses Git-based CI/CD to promote changes safely
Snowflake proficiency. Builds ELT models (staging/ODS/marts), tunes performance (warehouse sizing, clustering, micro-partitions, caching), uses Streams/Tasks/Snowpipe for CDC

Requirements:

Strong hands-on pulling data from REST/GraphQL APIs with auth (OAuth2, API keys), pagination, rate limits, retries/backoff, and webhooks
Strong Python skills to normalize/enrich data and land it cleanly into S3 (schema, partitioning, Parquet)
Comfortable building/operating S3-based lakes with layered zones (raw → harmonized → conformed → modeled)
Glue Data Catalog, IAM/Secrets Manager, VPC endpoints, encryption, lifecycle/versioning, and cost/perf best practices (file sizing, compaction)
Designs and optimizes Glue jobs using PySpark/DynamicFrames, bookmarks for incremental loads, dependency packaging, robust error handling, logging/metrics, and unit tests
Knows how to tune jobs for scale and cost
Writes clean, parameterized, idempotent DAGs (sensors, SLAs, retries, alerts)
Manages dependencies across pipelines, and uses Git-based CI/CD to promote changes safely
Builds ELT models (staging/ODS/marts)
Tunes performance (warehouse sizing, clustering, micro-partitions, caching)
Uses Streams/Tasks/Snowpipe for CDC

Data Engineer - Security (without Kafka Experience)

Key skills

About this role

Responsibilities:

Requirements: