Hark is an artificial intelligence company focused on developing advanced, personalized intelligence systems. The role involves working on large-scale pretraining systems and foundation models, with responsibilities including data curation, training infrastructure development, and collaboration with research and engineering teams to enhance model capabilities.
Responsibilities:
- Drive research and development in large-scale LLM and multimodal pretraining, focusing on improving model capability through better data, scaling, and architecture
- Develop and optimize data pipelines for pretraining, including large-scale data curation, filtering, deduplication, and synthetic data generation
- Design and implement efficient training strategies for foundation models, including distributed training, scaling laws, and optimization techniques
- Build and improve pretraining infrastructure, including training systems, data pipelines, and compute efficiency
- Develop evaluation frameworks and internal benchmarks to measure pretraining progress and model capability
- Collaborate with research and engineering teams to push the frontier of foundation model performance and scalability