Trans Ova Genetics is a company focused on animal genetics and bioinformatics. They are seeking a Sr. Data Engineer responsible for designing, developing, and maintaining data integration, analytics, and reporting solutions to support their workloads.

Responsibilities:

Design, develop, and maintain robust and efficient ETL/ELT pipelines and processes on Databricks for both operational and bioinformatics datasets (e.g., genomic markers, phenotypic data, laboratory outputs)
Ingest, transform, and harmonize structured and semi-structured biological data from lab systems, LIMS, sequencing platforms, and external partners into the enterprise data platform
Troubleshoot and resolve Databricks pipeline errors and performance issues
Optimize data flow performance and minimize data latency across scientific and business use cases
Implement data quality checks, validations, and reconciliation processes within ETL workflows, including domain-specific checks for genomic and phenotypic data
Develop and maintain Databricks pipelines, notebooks, and datasets using Python, Spark, and SQL
Optimize Databricks jobs for performance and cost-effectiveness, including largescale bioinformatics and analytics workloads
Integrate Databricks with other data sources and systems, including lab instruments, genomic databases, and on-prem or cloud data stores
Participate in the design and implementation of data lake architectures that support both traditional analytics and bioinformatics pipelines
Participate in the design and implementation of data warehousing solutions to support reporting, analytics, and scientific modeling
Model and curate subject areas for genetics, reproduction, and bioinformatics (e.g., animals, pedigrees, genotypes, breeding values, trials)
Support data quality initiatives and implement data cleansing procedures across business and scientific domains
Collaborate with business users, scientists, geneticists, and bioinformaticians to understand data requirements for department-driven reporting and analytics needs
Maintain and extend the existing library of complex dashboards and visualizations to surface genetic, reproductive, and operational insights
Enable self-service analytics for R&D and product teams by exposing well-governed, documented data products
Troubleshoot and resolve report issues, including performance bottlenecks and data inconsistencies
Apply strong programming skills in Python, SQL, and Spark to build scalable data and bioinformatics workflows
Use CI/CD and IaC tools (Terraform, ARM, CloudFormation) to automate deployment of data platform components and analytics environments
Design and implement Databricks platform architecture on Azure and AWS infrastructure, including environments that support largescale scientific computation
Contribute to cloud security, governance, and cost optimization practices for data and bioinformatics workloads
Partner with geneticists, biostatisticians, and bioinformaticians to translate scientific requirements into scalable data and platform architectures
Support or orchestrate bioinformatics pipelines (e.g., variant processing, quality control, annotation, genotype imputation, genomic evaluation) using cloud and Databricks capabilities
Ensure that data models, pipelines, and storage structures meet the needs of downstream analytics, predictive models, and genetic evaluations
Advocate for best practices in managing sensitive biological and genetic data, including data governance, access control, and compliance with relevant standards and regulations
Thrive in an entrepreneurial, self-starting, and fast-paced environment, working both independently and with our highly skilled teams
Collaborate effectively with business users, data analysts, scientists, and other IT teams
Communicate technical information clearly and concisely, both verbally and in writing, to technical and nontechnical stakeholders
Document all development work, data models, and procedures thoroughly, including bioinformatics and scientific data flows
Keep abreast of the latest advancements in data integration, cloud platforms, bioinformatics tooling, and data engineering technologies
Continuously improve skills and knowledge through training and self-learning in both data engineering and bioinformatics domains

Requirements:

Bachelor's degree in Computer Science, Information Systems, Bioinformatics, Computational Biology, or a related field; a Master's degree is an asset
7+ years of experience in data integration and reporting, with experience designing and operating cloud-based data platforms
Extensive experience with Databricks, including Python, Spark, and Delta Lake
Strong proficiency with relational databases (e.g., SQL Server, RDS), including TSQL, stored procedures, and functions
Experience with data warehousing concepts and best practices
Experience with Microsoft Azure cloud platform; exposure to Microsoft Fabric is desirable
Hands on experience working with biological, genomic, or other omics datasets in a bioinformatics or life sciences setting (e.g., sequence data, SNP arrays, GWAS outputs, phenotypic traits)
Strong analytical and problem-solving skills, with the ability to reason about complex data and scientific requirements
Excellent communication and interpersonal skills
Ability to work independently and as part of a cross-functional team across IT, science, and business
Experience with Agile methodologies
Demonstrated background in bioinformatics or computational biology, preferably supporting genetics, breeding, or life science research in an applied or commercial context
Must be legally authorized to work in the United States
Familiarity with common bioinformatics tools, data formats (e.g., FASTQ, VCF, PLINK), and workflows is highly desirable

Sr. Data Engineer

Key skills

About this role

Responsibilities:

Requirements: