Role Overview

Build strong partnerships with SPD experimentalists, process engineers, and analytical scientists to gather requirements for data solutions which will have direct pipeline impact
Design and implement robust, scalable data pipelines that ingest experimental and process data from SPD teams
Deliver analysis-ready datasets to support SPD digital initiatives, including process modeling and Bayesian optimization
Define and enforce data standards, metadata schemas, and ontologies that make SPD data interoperable and readily consumable by downstream modeling and optimization workflows
Automate data ingestion from laboratory instruments, electronic lab notebooks, PAT systems, and manufacturing systems and integrate with cloud-based storage and compute environments
Apply and generate data analysis and visualization workflows
Design and develop dashboards, reports, and data exports
Curate data and support definition of needs for automation of data ingestion
Influence digital data strategy for SPD by identifying opportunities to improve data capture practices at the source and reduce friction between experimentation and modeling
Demonstrate excellent interpersonal, communication, and collaboration skills
Embrace and model our core values including fostering a supportive culture where all can thrive
Collaborate effectively in a dynamic, integrated, and multidisciplinary team environment
Perform impactful scientific innovation in a team-oriented manner that builds trusted partnerships across vast stakeholder networks
Publish and present research, including maintaining an established track record of interaction with the broader academic community

Requirements

Ph.D. in Computer Science, Data Science, Engineering, Chemistry, Physics, Biology, Pharmaceutical Sciences, or a closely-related field with at least 3 years of industrial/pharmaceutical or relevant experience
M.S. in Computer Science, Data Science, Engineering, Chemistry, Physics, Biology, Pharmaceutical Sciences, or a closely-related field with at least 5 years of industrial/pharmaceutical or relevant experience
B.S. in Computer Science, Data Science, Engineering, Chemistry, Physics, Biology, Pharmaceutical Sciences, or a closely-related field with at least 7 years of industrial/pharmaceutical or relevant experience
Proficient in Python and/or another programming language (e.g., Java, R)
Comfortable working in development environments such as Posit/RStudio/Jupyter
Solid SQL skills with hands-on experience writing and optimizing queries against relational databases
Experience with ETL/ELT (Extract, Transform, Load) processes and building data pipelines in a scientific or pharmaceutical context
Familiarity with cloud platforms (AWS, Azure, or GCP) for data storage, processing, and integration
Prior hands-on experience in sterile drug product development, sterile DS and DP manufacturing processes, or closely related pharmaceutical development — with a demonstrated transition into a data engineering, data science, or computational role
Working knowledge of how mechanistic and data-driven models consume and depend on experimental data — sufficient to anticipate modeler needs and deliver appropriately structured datasets

Tech Stack

AWS
Azure
Cloud
ETL
Google Cloud Platform
Java
Python
SQL

Benefits

medical, dental, vision healthcare and other insurance benefits (for employee and family)
retirement benefits, including 401(k)
paid holidays
vacation
compassionate and sick days

Associate Principal Scientist, Data Engineer, Digital Insights

Key skills

About this role

Role Overview

Requirements

Tech Stack

Benefits