Etched is building AI chips that are hard-coded for individual model architectures. The Machine Learning Research Engineer will propose and conduct research to optimize performance on Sohu, collaborating with hardware architects to develop software solutions that leverage the unique capabilities of their AI hardware.

Responsibilities:

Propose and conduct novel research to achieve results on Sohu that are unviable on GPUs
Translate core mathematical operations from the most popular Transformer-based models into maximally performant instruction sequences for Sohu
Develop deep architectural knowledge informing best-in-the-world software performance on Sohu HW, collaborating with HW architects and designers
Co-design and finetune emerging model architectures for highest efficiency on Sohu
Guide and contribute to the Sohu software stack, performance characterization tools, and runtime abstractions by implementing frontier models using Python and Rust
Propose and implement a novel test time compute algorithm that leverages Sohu’s unique capabilities to unlock a product could never be achieved on a typical GPU
Implement diffusion models on Sohu to achieve GPU-impossible latencies that allow for real-time image generation
Optimize model instructions and scheduling algorithms to optimize for utilization, latency, throughput, and/or a mix of these metrics
Implement model-specific inference-time acceleration techniques such as speculative decoding, tree search, KV cache sharing, priority scheduling, etc by interacting with the rest of the inference serving stack

Machine Learning Research Engineer

Key skills

About this role

Responsibilities: