Bioptimus is building the first universal AI foundation model for biology to fuel breakthrough discoveries and accelerate innovation in biomedicine. They are seeking a meticulous Biology Data Quality Engineer to ensure the integrity and usability of complex biological datasets, working closely with the R&D team to maintain high data quality standards.

Responsibilities:

Develop and implement comprehensive data validation protocols for diverse biological datasets (histology, omics, clinical). Ensure data integrity, consistency, and accuracy through rigorous quality checks. Design and implement automated data quality pipelines to streamline data validation and identify potential issues early in the data processing workflow
Establish and enforce data standardization practices to facilitate seamless integration and analysis across different data types. Curate datasets to enhance their usability for machine learning
Work closely with the R&D team to understand data requirements and address data quality concerns. Communicate data quality findings and recommendations effectively to technical and non-technical stakeholders. Communicate and synchronize with external data providers
Maintain a detailed documentation of the data-quality assessment procedures, validation results, and data specifications. Generate regular reports on data quality metrics and trends
Evaluate and validate external public data sources, ensuring they meet our quality standards and are suitable for inclusion in our foundation model training
Stay up-to-date with the latest data quality best practices and tools in the biological domain. Propose and implement improvements to our data-quality assessment processes and pipelines

Requirements:

Deep understanding of transcriptomics data types (bulk, single-cell, spatial) and their specific quality considerations
Good knowledge of genomics and proteomics data
Proven experience in implementing data quality control procedures and pipelines
Familiarity with data validation tools and techniques
Strong analytical and problem-solving skills to identify and resolve data quality issues
Proficiency in Python
Good knowledge of data visualization libraries (e.g. matplotlib)
Excellent written and verbal communication skills to effectively convey data quality findings and recommendations
MSc in Biology, Computational Biology, Bioinformatics
Experience in machine learning analysis of histology images
Experience working with AWS
Experience with developing and implementing data annotation guidelines and processes
Experience with data ontologies
Proven experience building or contributing to large-scale data collections (e.g. Human Cell Atlas)
Spatial alignment of multimodal datasets (e.g. alignment between different imaging modalities)

Biology Data Quality Engineer

Key skills

About this role

Responsibilities:

Requirements: