Role Overview

Build and maintain robust, scalable data pipelines that ingest experimental and process data from upstream biologics source systems
Deliver analysis-ready datasets to support upstream digital initiatives, including process characterization models, scale-up predictions, multivariate analytics, and high throughput process development workflows.
Map instrument outputs and experimental results ensuring ontology alignment and interoperability across upstream data sources.
Develop and maintain data visualizations, dashboards, and reports that enable upstream scientists to explore process data across runs, molecules, and scales.
Support system of record standards by ensuring consistent data entry practices.
Identify and flag data quality issues, gaps in metadata, and inconsistencies across source systems, contributing to continuous improvement of upstream data capture practices.
Collaborate with upstream process development scientists, analytical scientists, and engineers to understand evolving data needs and translate them into pipeline requirements.
Coordinate with adjacent domain engineers to ensure seamless data handoffs at domain boundaries.
Maintain and version all pipeline code in GitHub, following team standards for code review, documentation, and deployment.
Demonstrate excellent interpersonal, communication, and collaboration skills
Embrace and model our core values of inclusion, including fostering a supportive culture where all can thrive.
Collaborate effectively in a dynamic, integrated, and multidisciplinary team environment.

Requirements

Ph.D. in Computer Science, Data Science, Molecular Modeling, Engineering, Chemistry, Physics, Biology, Pharmaceutical Sciences, or a closely-related field
M.S. in Computer Science, Data Science, Molecular Modeling, Engineering, Chemistry, Physics, Biology, Pharmaceutical Sciences, or a closely-related field with at least 2 years of industrial/pharmaceutical or relevant experience
B.S. in in Computer Science, Data Science, Molecular Modeling, Engineering, Chemistry, Physics, Biology, Pharmaceutical Sciences, or a closely-related field with at least 4 years of industrial/pharmaceutical or relevant experience
Proficient in Python and/or R programming
Comfortable working in development environments such as Jupyter, Posit/RStudio, or VS Code
Solid SQL skills with hands-on experience writing and optimizing queries against relational databases and data warehouses
Experience with ETL/ELT processes and building data pipelines in a scientific or pharmaceutical context
Familiarity with version control systems (Git/GitHub) and collaborative software development practices
Ability to work in a team environment with cross-functional interactions
Motivated to learn new skills, willingness to take on new challenges, and scientific curiosity

Tech Stack

ETL
Python
SQL

Benefits

medical, dental, vision healthcare and other insurance benefits (for employee and family)
retirement benefits, including 401(k)
paid holidays, vacation, and compassionate and sick days

Senior Data Engineer – Upstream Biologics

Key skills

About this role

Role Overview

Requirements

Tech Stack

Benefits