
Minimum Qualifications:
Associate’s degree or equivalent in related field and one-year related experience.
PREFERRED QUALIFICATIONS:
Master's degree in bioinformatics, computational biology, computer science, genomics, physics, or a related field (or equivalent practical experience)
Strong programming skills
Background in machine learning or statistical modeling (e.g., clustering, representation learning, deep learning)
Experience with large language models (LLMs), including retrieval-augmented generation, prompt engineering, agentic workflows, or fine-tuning
Familiarity with Linux/HPC environments and version control (e.g., Git)
Experience working with NGS data and large biological datasets
Experience with tools for comparative genomics across multiple species (alignment software, comparative browser, etc.)
Experience working with genome assemblies, liftover, or cross-species alignment
Familiarity with visualization of large genomic datasets (2D contact maps, embeddings, interactive browsers)
Experience integrating structured scientific datasets, scientific literature, or experimental metadata into computational workflows
Job Summary:
To provide technical skills in the preparation and use of programs for the solution of problems by electronic computers.
Job Duties and Responsibilities
Core duties:
Develops new programs.
Verifies solutions.
Prepares description of program, flow charts, and block diagrams.
Prepares program documentation and operating instructions for new programs.
Modifies existing programs.
Adheres to internal controls established for departments.
Performs related duties as required.
Project responsibilities will include:
1. System design. Develop architectures for AI systems that integrate biological datasets, scientific literature, experimental metadata, and large-language-model capabilities. These systems may include retrieval-augmented generation, agentic workflows, structured reasoning pipelines, or domain-specific interfaces for biological research.
2. Biological data integration. Identify, organize, and prepare relevant biological data sources for use in AI workflows. These may include genomic, epigenomic, proteomic, imaging, structural biology, or literature-derived datasets, depending on project needs.
3. LLM-based workflow development. Design and implement LLM-powered tools for scientific question answering, hypothesis generation, literature analysis, experimental planning, data interpretation, and automated report generation. The candidate will evaluate model outputs for scientific accuracy, traceability, and usability.
4. Prototype implementation. Build working prototypes, scripts, notebooks, APIs, or lightweight applications demonstrating the proposed AI systems. Prototypes should be sufficiently documented to enable the project team to review, test, and further develop them.
5. Evaluation and validation. Develop practical evaluation criteria for biological and scientific AI systems, including accuracy, reproducibility, citation grounding, failure modes, hallucination risk, and usefulness to researchers. The candidate will test systems on representative biological use cases and summarize results.
6. Documentation and recommendations. Prepare clear technical documentation that describes the system architecture, data inputs, model choices, workflows, limitations, and recommended next steps. Documentation should be suitable for internal scientific and technical review.
7. Collaboration. Meet periodically with project leadership and relevant scientific or computational collaborators to review progress, refine priorities, and incorporate feedback.
Deliverables
The candidate will provide one or more of the following, as requested by the project team:
Technical design documents for AI systems spanning biology and LLMs.
Prototype software, notebooks, scripts, or application components.
Curated or structured biological data inputs for AI workflows.
Evaluation reports describing system performance, limitations, and risks.
Written recommendations for future development, deployment, or publication.
Periodic progress summaries.
Expected Outcome
The work is expected to produce practical designs and early-stage prototypes for AI systems that support biological research with large language models, with an emphasis on scientific rigor, interpretability, reliable grounding in source material, and usability for researchers
DEPARTMENT MARKETING STATEMENT:
We are seeking a highly motivated programmer to support computational analysis of Hi-C and other sequencing datasets, with a particular focus on comparative genome organization across multiple species under the umbrella of the DNA Zoo Consortium (dnazoo.org). This role is ideal for someone who enjoys working at the intersection of genomics, chromatin architecture (1D and 3D), and data science, and who wants hands-on involvement in large, cross-species projects.
In addition, this role offers the opportunity to help shape AI systems that bring large language models to biological research, building tools that combine curated scientific data with modern LLM-based reasoning and retrieval to accelerate discovery.
Salary Range:
Actual salary commensurate with experience or range, if discussed and approved by hiring authority.
EQUAL EMPLOYMENT OPPORTUNITY:
UTMB Health strives to provide equal opportunity employment without regard to race, color, religion, age, national origin, sex, gender, sexual orientation, gender identity/expression, genetic information, disability, veteran status, or any other basis protected by institutional policy or by federal, state or local laws unless such distinction is required by law. As a Federal Contractor, UTMB Health takes affirmative action to hire and advance protected veterans and individuals with disabilities.