Deepgram is the leading platform underpinning the emerging trillion-dollar Voice AI economy, providing real-time APIs for speech-to-text and text-to-speech. They are seeking a highly skilled Machine Learning Engineer to join their Research team, where you will partner with research scientists to prototype and validate novel modeling ideas for speech technologies.
Responsibilities:
- Scalable Model Training: Architect and manage horizontally scalable systems that dramatically accelerate the end-to-end training lifecycle for Speech-to-Text (STT) and Text-to-Speech (TTS) models. This includes far more than automated training: the role focuses on making model development significantly faster and more efficient through optimized data preparation and management, high-throughput training pipelines, distributed infrastructure, and automated evaluation tooling
- Tooling & Accessibility: Design and implement internal UIs and tools that make ML systems and workflows accessible to non-technical stakeholders across the company. These UIs should be designed to provide transparency and flexibility to internally built tooling
- Infrastructure & Tools: Oversee and manage training tooling, job orchestration, experiment tracking, and data storage
Requirements:
- Strong experience with the machine learning research pipeline, particularly in STT or related speech domains. This includes experimenting with and evaluating new architectures and modeling approaches, and implementing large-scale training systems
- Proficiency with orchestration and infrastructure tools like Kubernetes, Docker, and Prefect
- Familiarity with ML lifecycle tools such as MLflow
- Experience building internal tools or dashboards for non-technical users
- Hands-on experience with data engineering practices for unstructured audio and text data
- Comfortable working in cross-functional teams that include researchers, engineers, and product stakeholders