Etched is building AI chips that are hard-coded for individual model architectures. The Machine Learning Research Engineer will propose and conduct research to optimize performance on Sohu, collaborating with hardware architects to develop software solutions that leverage the unique capabilities of their AI hardware.
Responsibilities:
- Propose and conduct novel research to achieve results on Sohu that are unviable on GPUs
- Translate core mathematical operations from the most popular Transformer-based models into maximally performant instruction sequences for Sohu
- Develop deep architectural knowledge informing best-in-the-world software performance on Sohu HW, collaborating with HW architects and designers
- Co-design and finetune emerging model architectures for highest efficiency on Sohu
- Guide and contribute to the Sohu software stack, performance characterization tools, and runtime abstractions by implementing frontier models using Python and Rust
- Propose and implement a novel test time compute algorithm that leverages Sohu’s unique capabilities to unlock a product could never be achieved on a typical GPU
- Implement diffusion models on Sohu to achieve GPU-impossible latencies that allow for real-time image generation
- Optimize model instructions and scheduling algorithms to optimize for utilization, latency, throughput, and/or a mix of these metrics
- Implement model-specific inference-time acceleration techniques such as speculative decoding, tree search, KV cache sharing, priority scheduling, etc by interacting with the rest of the inference serving stack