Hark is an artificial intelligence company focused on creating advanced, personalized intelligence systems. They are seeking a Model Distillation Engineer to compress large audio and multimodal models into student models that fit the constraints of their hardware, while managing distillation, quantization, and architecture-aware compression processes.
Responsibilities:
- Design and execute distillation strategies (response, feature, and self-distillation) to compress teacher models into deployable students
- Apply quantization (PTQ and QAT), pruning, and architecture search to hit per-product size, latency, and power budgets
- Build a reusable distillation and compression toolchain that the broader audio ML team can adopt across model families
- Partner with the broader audio ML team on training pipelines and with the runtime team on deployment targets
- Define accuracy retention and resource KPIs per product and track them through the release cycle
- Profile compressed models on target hardware and iterate with DSP and runtime engineers on bottlenecks