Hark is an artificial intelligence company focused on developing advanced, personalized intelligence systems. The role involves working on large-scale pretraining systems and foundation models, with responsibilities including data curation, training infrastructure development, and collaboration with research and engineering teams to enhance model capabilities.

Responsibilities:

Drive research and development in large-scale LLM and multimodal pretraining, focusing on improving model capability through better data, scaling, and architecture
Develop and optimize data pipelines for pretraining, including large-scale data curation, filtering, deduplication, and synthetic data generation
Design and implement efficient training strategies for foundation models, including distributed training, scaling laws, and optimization techniques
Build and improve pretraining infrastructure, including training systems, data pipelines, and compute efficiency
Develop evaluation frameworks and internal benchmarks to measure pretraining progress and model capability
Collaborate with research and engineering teams to push the frontier of foundation model performance and scalability

Member of Technical Staff, Pretraining

Key skills

About this role

Responsibilities: