Peraton is a next-generation national security company that drives missions of consequence spanning the globe. They are seeking a Data Scientist / ML Platform Engineer to contribute across the full ML development lifecycle, focusing on applied data science and MLOps while collaborating with dedicated infrastructure engineers.
Responsibilities:
- Develop, train, and evaluate ML models (classification, regression, clustering, anomaly detection) and contribute to LLM-based capabilities such as RAG pipelines and prompt evaluation
- Support model governance and deployment practices using MLFlow, including experiment tracking, model versioning, registry promotion workflows, and automated testing across the ML lifecycle
- Contribute to production ML operations: model performance monitoring, drift detection, automated alerting, and incident escalation to maintain reliability and SLA compliance
- Build and improve model serving infrastructure, feature pipelines, and lifecycle automation to support reproducible, scalable model development and inference
- Apply explainability techniques (e.g., SHAP, LIME) and produce technical documentation to support stakeholder transparency and compliance requirements
- Contribute to data ingestion, ELT/ETL transformation, and pipeline reliability using Spark and SQL-based frameworks within Snowflake and Databricks environments
- Support pipeline orchestration, medallion architecture conventions, and data stewardship practices (metadata management, PII handling, lineage tracking in Unity Catalog)
- Perform occasional system administration tasks in collaboration with platform teams, including environment configuration, access management, compute troubleshooting, and secrets handling using platform-native tools
Requirements:
- Associate's with 6 years, or Bachelor's degree with 4+ years of relevant experience, or Master's degree with 2+ years of relevant experience or High School diploma with 8 years of experience in lieu of a degree
- Demonstrated experience with SQL and Python, including Python-based ML frameworks (e.g., scikit-learn, XGBoost, PyTorch, or TensorFlow)
- Hands-on experience with MLFlow or equivalent tools for experiment tracking, model governance, and lifecycle management
- Strong understanding of SDLC fundamentals and experience with GitHub or equivalent version control
- Experience with distributed compute environments (e.g., Spark, Databricks) and cloud-native services
- Basic proficiency with Bash or shell scripting for automation and environment setup
- Ability to collaborate across multidisciplinary teams and communicate technical concepts to varied audiences
- Ability to obtain and maintain a Public Trust clearance
- US citizenship required or Green Card holder and must have been in the USA for 3 of the last 5 years
- Experience with MLOps practices including CI/CD for ML, containerization, feature pipeline automation, and model deployment frameworks
- Experience with Databricks E2 components (Unity Catalog, Feature Store, Delta Live Tables) and/or model serving and drift monitoring tools (e.g., Databricks Model Serving, Evidenly, etc.)
- Experience with LLM frameworks (e.g., LangChain, LlamaIndex, Hugging Face Transformers) and familiarity with model explainability libraries (e.g., SHAP, LIME)
- Advanced Spark performance optimization experience and/or API development using Databricks REST APIs
- Experience with healthcare analytics data (preferably Medicare or Medicaid) and familiarity with HIPAA or FedRAMP compliance constraints
- Experience building data pipelines in a Snowflake or Databricks environment
- Familiarity with orchestration tools (Airflow, Databricks Workflows)
- Exposure to streaming data patterns using Spark Structured Streaming, Delta Live Tables, or Kafka
- Familiarity with environment reproducibility tooling (Docker, conda) and scripting (Python, Bash) to support automation and CI/CD tasks