Design and run experiments to test out hypotheses on the path to foundation model development.
Engineer meaningful evals and metrics which enable rapid model iteration.
Design, build and maintain scalable, reproducible libraries for training, experimentation evaluation, and simulation, in service of large-scale research initiatives.
Implement model architectures both from the literature and developed in collaboration with our in-house researchers that push the boundaries of molecular simulation.
Enable agent-driven research and workflows and maintain guardrails on agentic tooling.
Help prepare manuscripts, software artifacts, and datasets for public release.
Requirements
Strong software engineering fundamentals, with experience not just building one-off scripts but reproducible pipelines for research, writing necessary documentation, and observing coding best-practices.
Track record of observable artifacts (e.g., GitHub, papers) showing work in ML or scientific computing libraries.
Solid working knowledge of PyTorch and JAX and the modern ML research stack.
Comfortable with HPC or large-scale compute environments, and used to thinking on the scale of hundreds or thousands (or even more!) fits running at once.
Sufficient scientific depth to engage with the research questions, whether developed through prior industry experience or during a PhD.
Experience with equivariant architectures, geometric deep learning, or GNNs (NequIP, MACE, SchNet, PaiNN, or similar).
Familiarity with generative modeling: diffusion models, flow matching, score-based methods.
Regular involvement in open-source ML or scientific computing libraries.
Experience building agent-driven research, active learning, and data curation pipelines.