Hark is an artificial intelligence company focused on creating advanced, personalized intelligence systems. They are seeking a Model Distillation Engineer to compress large audio and multimodal models into student models that fit the constraints of their hardware, while managing distillation, quantization, and architecture-aware compression processes.

Responsibilities:

Design and execute distillation strategies (response, feature, and self-distillation) to compress teacher models into deployable students
Apply quantization (PTQ and QAT), pruning, and architecture search to hit per-product size, latency, and power budgets
Build a reusable distillation and compression toolchain that the broader audio ML team can adopt across model families
Partner with the broader audio ML team on training pipelines and with the runtime team on deployment targets
Define accuracy retention and resource KPIs per product and track them through the release cycle
Profile compressed models on target hardware and iterate with DSP and runtime engineers on bottlenecks

Model Distillation Engineer

Key skills

About this role

Responsibilities: