Design conceptual, logical, and physical data models for complex federal environments.
Lead the transition from legacy on-premises systems to modern, cloud-native (AWS/GCP) data platforms.
Architect and oversee the build of automated ETL/ELT pipelines using Python, SQL, and PySpark to ingest and transform unstructured and structured data.
Implement and optimize enterprise data warehouses using tools like AWS Redshift, Google BigQuery, AWS Glue, and Databricks.
Establish data governance frameworks, metadata management, and data lineage in alignment with federal standards (HIPAA, FHIR, NIST).
Conduct index/partition design, query tuning, and sharding strategies to ensure high availability and scalability for real-time analytics.
Design data architectures that facilitate AI/ML initiatives, including model training pipelines and real-time inference in production environments.
Mentor a team of data engineers, enforce software engineering best practices (CI/CD, unit testing, documentation).
Requirements
Must be a U.S. Citizen.
Masters’s Degree or Above in Systems Engineering, Computer Science or related field.
An active security clearance or the ability to obtain one is required.
Minimum 6+ years of experience to include:
Experience in data management, utilizing advanced analytics tools and platforms and Python.
Experience with Data Warehousing consulting/engineering or related technologies (Redshift, Databricks, BigQuery, OADW, Apache Hive, Apache Lucene).
Experience in scripting, tooling, and automating large-scale computing environments.
Extensive experience with major tools such as Python, Pandas, PySpark, NumPy, SciPy, SQL, and Git; Minor experience with TensorFlow, PyTorch, and Scikit-learn.
Compliance: Deep understanding of data security and federal compliance requirements.