Defense Unicorns is a company focused on delivering secure solutions for continuous software integration and delivery. They are seeking a Data Engineer to work closely with mission heroes in the Department of Defense environments, deploying and integrating data capabilities while ensuring operational stability and compliance.
Responsibilities:
- Deploy and configure UDS Data Capability in the mission hero's environment. Stand up the UDS Store (Iceberg, Rook/Ceph, pgvector, Postgres), wire up UDS Transit for air-gap data movement, configure UDS Govern policies (Pepr/Lula), and integrate UDS Connect (Strimzi/Kafka) where streaming or legacy connectors are required
- Build and support integrations with existing mission systems. Connect UDS Data Capability to legacy databases, flat-file drops, SOAP/REST endpoints, message buses, existing object storage, and identity providers (Keycloak, mission-side SSO). Prior experience integrating with these types of systems is more relevant than experience building on them
- Contribute to mapping complex data landscapes. Some engagements involve hundreds of interconnected systems of record with overlapping schemas and deeply interdependent data flows. You'll help trace how data moves across systems and identify dependencies alongside senior engineers and government stakeholders
- Build pipelines that move data through classification boundaries, including ingestion, transformation, catalog registration, model/dataset packaging via Zarf, cross-domain transit, and eventual consistency across DDIL conditions
- Implement data provenance, lineage, and governance practices. Track where data came from, how it transformed, and who can access it
- Operate what you deploy. Day-2 ownership includes capacity, performance, backup/restore (Velero), observability (Vector/Loki), incident response, and upgrade paths. Hand off to the mission hero's ops team once it's stable
- Generate accreditation artifacts, including STIG evidence, cATO documentation, FIPS validation notes, and policy mappings. You produce the evidence the mission hero's ISSM/ISSO needs to run this in IL4/IL5
- Contribute field feedback back to product and engineering. File issues, write postmortems, and surface what's working and what's breaking in the mission environment
- Support training and knowledge transfer. Contribute to runbooks, architecture docs, and working sessions that leave the mission hero's team self-sufficient
Requirements:
- Unstructured data at scale. Experience storing and querying large unstructured datasets using data lake architectures. Spark experience preferred
- Streaming & integration. Experience with stream processing infrastructure (Kafka, Redpanda, Flink, or equivalent) and bridging data from heterogeneous sources into modern pipelines
- Data warehousing. Open-source data warehousing platform experience. These environments do not support proprietary platforms, so you need to be comfortable building without them
- Pipelines & orchestration. Airflow, Dagster, Argo Workflows, or similar. Comfort building, scheduling, monitoring, and recovering production data pipelines
- Data modeling & SQL. Fluent in SQL. Comfortable designing schemas for both analytical and operational workloads
- Open-source orientation. You are comfortable building on open-source tooling and contributing back to it
- U.S. citizenship and the ability to obtain and maintain a DoD security clearance. Clearance sponsorship available for the right candidate
- Comfort working directly with mission heroes and government stakeholders. Clear communication with both technical and non-technical audiences
- Comfort with periodic on-site work, sometimes for days at a stretch, and equal comfort working remotely
- Bias toward delivery. Preference for shipping a working integration over perfecting a design that hasn't met a real workload
- Self-direction. You will encounter environments and problems that are not yet documented and will need to work through them independently
- Data provenance, lineage, and governance. Experience with lineage tracking, data catalogs, provenance systems, or governance frameworks. Depth here will be weighted heavily
- DoD or defense program experience
- Active Secret clearance (or higher)
- Lakehouse & storage: Apache Iceberg (or Delta/Hudi), object storage (Ceph/S3-compatible), Postgres (including extensions like pgvector), columnar/OLAP engines (Trino, DuckDB, ClickHouse, Spark SQL)
- Change Data Capture: Debezium or similar CDC patterns
- Governance, catalog & access: REST catalogs (Iceberg REST, Polaris/Gravitino/Nessie family), ABAC/RBAC patterns, OIDC/OAuth, lineage and audit
- Kubernetes awareness. Deep K8s expertise is not required; general familiarity with deployments, operators, and how applications run on Kubernetes is valuable
- Linux fundamentals, container runtime behavior, networking, TLS, secrets management
- IaC (Terraform, Pulumi, or similar) and GitOps patterns (Flux, ArgoCD)
- Familiarity with the CNCF ecosystem, including the distinction between foundation projects and single-vendor projects
- AI/ML awareness. General understanding of how data infrastructure supports model training, versioning, provenance, and AI operations
- Familiarity with Air Force or Space Force systems of record (e.g., MILPDS, ARMS) and how data flows between them