Concepts Beyond is seeking a hands-on AI / Data Engineer & Analyst to power aviation safety and air traffic management programs through enterprise-scale data engineering, applied machine learning, and intelligent system design. This role involves architecting data pipelines, training models, and deploying analytics solutions that drive decisions in the National Airspace System.

Responsibilities:

Architect, build, and maintain scalable data pipelines for structured, semi-structured, and unstructured data using orchestration tools (e.g., Apache Airflow, Prefect, AWS Glue, or Azure Data Factory)
Design and implement robust ETL/ELT processes with strong error handling, idempotency, monitoring, and dependency management across cloud and hybrid environments
Integrate and manage data across enterprise platforms (e.g., Palantir Foundry, AWS, Azure, GCP); process high-volume data using distributed frameworks (Apache Spark, Flink)
Develop, train, and operationalize NLP/ML models for low-latency real-time voice pipelines using streaming speech-to-text and text-to-speech, diarization, classification, and named entity recognition over controller–pilot voice and text
Develop Retrieval-Augmented Generation (RAG) pipelines over enterprise vector stores with hybrid retrieval, re-ranking, and grounded evaluation
Custom-develop, fine-tune, and deploy large and small language models (LLMs and SLMs) for real-time operational analysis and decision support; build streaming NLP and agentic architectures that integrate with enterprise aviation platforms
Develop AI/ML solutions for predictive analytics in aviation safety; probabilistic modeling, time-series and anomaly detection, and causal-factor analysis on ASIAS/FOQA/ASRS and related data
Analyze state-of-the-art technologies, drive cutting-edge AI strategies and architectures, identify emerging trends, gaps, and innovation opportunities
Promote thought leadership through publications, conference presentations, and industry collaboration
Contribute ideas that support growth and new business opportunities

Requirements:

Must be US Citizen
Bachelor's or Master's degree in Engineering, Computer Science, or related field
5+ years of data engineering experience developing scalable pipelines and analytics systems. Ph.D degree may be substituted for experience
Proficient in Python with strong software engineering practices: OOP, testing frameworks (pytest), logging, error handling, and version control
Expertise in ETL/ELT orchestration (Apache Airflow, Prefect, Luigi, AWS Glue, or Azure Data Factory); deep SQL proficiency including query optimization and index tuning
Data modeling expertise: normalization, star/snowflake schemas, slowly changing dimensions (Type 1/2)
Experience with big data processing frameworks (Apache Spark, Flink) and cloud data ecosystems (AWS, Azure, GCP)
Hands-on experience custom-developing AI/ML solutions (LLMs/SLMs, real-time voice/speech), and predictive data analytics; pipelines, preprocessing, embedding, grounding, and production deployment
Working knowledge of Generative AI and RAG architectures, vector databases, and enterprise data infrastructure integration with model versioning, monitoring, and rollback strategies
FAA domain or Aviation Safety systems exposure highly desirable (e.g., ASIAS, SWIM, Foundry)
DevSecOps in regulated or safety-critical environments; experience leading technical architecture discussions
LLM/SLM fine-tuning using PyTorch, TensorFlow, Hugging Face Transformers (LoRA, QLoRA); MLOps practices: model drift detection, retraining pipelines, deployment monitoring
Real-time STT/TTS, NLP, and streaming voice (Whisper, WhisperX, faster-whisper, Wispr Flow, Google Cloud Speech, Azure Speech) with custom models, accent/ATC phraseology adaptation; real-time inference and AI agents/agentic architectures
Speech technologies: STT/TTS systems (Whisper, Google Cloud Speech, Azure Speech Services), custom voice models, accent adaptation
Probabilistic modeling and Bayesian inference (pgmpy, PyMC, Pyro, Stan); causal inference and graphical models applied to safety precursor analysis
Vector databases (Pinecone, Weaviate, ChromaDB, FAISS, pgvector); API integrations (RESTful/GraphQL) and streaming platforms (Kafka, Kinesis, Pulsar)
Containerization (Docker, Kubernetes), infrastructure-as-code (Terraform, CloudFormation); data visualization
Computer vision frameworks (PyTorch Vision, OpenCV, Detectron2, Ultralytics YOLO, SAM) and multimodal models (CLIP, LLaVA, GPT-4o Vision) for surveillance, surface, and document imagery

Principal AI Engineer & Data Analyst

Key skills

About this role

Responsibilities:

Requirements: