Concepts Beyond is seeking a hands-on AI / Data Engineer & Analyst to power aviation safety and air traffic management programs through enterprise-scale data engineering, applied machine learning, and intelligent system design. This role involves architecting data pipelines, training models, and deploying analytics solutions that drive decisions in the National Airspace System.
Responsibilities:
- Architect, build, and maintain scalable data pipelines for structured, semi-structured, and unstructured data using orchestration tools (e.g., Apache Airflow, Prefect, AWS Glue, or Azure Data Factory)
- Design and implement robust ETL/ELT processes with strong error handling, idempotency, monitoring, and dependency management across cloud and hybrid environments
- Integrate and manage data across enterprise platforms (e.g., Palantir Foundry, AWS, Azure, GCP); process high-volume data using distributed frameworks (Apache Spark, Flink)
- Develop, train, and operationalize NLP/ML models for low-latency real-time voice pipelines using streaming speech-to-text and text-to-speech, diarization, classification, and named entity recognition over controller–pilot voice and text
- Develop Retrieval-Augmented Generation (RAG) pipelines over enterprise vector stores with hybrid retrieval, re-ranking, and grounded evaluation
- Custom-develop, fine-tune, and deploy large and small language models (LLMs and SLMs) for real-time operational analysis and decision support; build streaming NLP and agentic architectures that integrate with enterprise aviation platforms
- Develop AI/ML solutions for predictive analytics in aviation safety; probabilistic modeling, time-series and anomaly detection, and causal-factor analysis on ASIAS/FOQA/ASRS and related data
- Analyze state-of-the-art technologies, drive cutting-edge AI strategies and architectures, identify emerging trends, gaps, and innovation opportunities
- Promote thought leadership through publications, conference presentations, and industry collaboration
- Contribute ideas that support growth and new business opportunities
Requirements:
- Must be US Citizen
- Bachelor's or Master's degree in Engineering, Computer Science, or related field
- 5+ years of data engineering experience developing scalable pipelines and analytics systems. Ph.D degree may be substituted for experience
- Proficient in Python with strong software engineering practices: OOP, testing frameworks (pytest), logging, error handling, and version control
- Expertise in ETL/ELT orchestration (Apache Airflow, Prefect, Luigi, AWS Glue, or Azure Data Factory); deep SQL proficiency including query optimization and index tuning
- Data modeling expertise: normalization, star/snowflake schemas, slowly changing dimensions (Type 1/2)
- Experience with big data processing frameworks (Apache Spark, Flink) and cloud data ecosystems (AWS, Azure, GCP)
- Hands-on experience custom-developing AI/ML solutions (LLMs/SLMs, real-time voice/speech), and predictive data analytics; pipelines, preprocessing, embedding, grounding, and production deployment
- Working knowledge of Generative AI and RAG architectures, vector databases, and enterprise data infrastructure integration with model versioning, monitoring, and rollback strategies
- FAA domain or Aviation Safety systems exposure highly desirable (e.g., ASIAS, SWIM, Foundry)
- DevSecOps in regulated or safety-critical environments; experience leading technical architecture discussions
- LLM/SLM fine-tuning using PyTorch, TensorFlow, Hugging Face Transformers (LoRA, QLoRA); MLOps practices: model drift detection, retraining pipelines, deployment monitoring
- Real-time STT/TTS, NLP, and streaming voice (Whisper, WhisperX, faster-whisper, Wispr Flow, Google Cloud Speech, Azure Speech) with custom models, accent/ATC phraseology adaptation; real-time inference and AI agents/agentic architectures
- Speech technologies: STT/TTS systems (Whisper, Google Cloud Speech, Azure Speech Services), custom voice models, accent adaptation
- Probabilistic modeling and Bayesian inference (pgmpy, PyMC, Pyro, Stan); causal inference and graphical models applied to safety precursor analysis
- Vector databases (Pinecone, Weaviate, ChromaDB, FAISS, pgvector); API integrations (RESTful/GraphQL) and streaming platforms (Kafka, Kinesis, Pulsar)
- Containerization (Docker, Kubernetes), infrastructure-as-code (Terraform, CloudFormation); data visualization
- Computer vision frameworks (PyTorch Vision, OpenCV, Detectron2, Ultralytics YOLO, SAM) and multimodal models (CLIP, LLaVA, GPT-4o Vision) for surveillance, surface, and document imagery