ProCogia is a data consulting firm helping businesses transform data into real growth. They are seeking an LLM Research Intern to collaborate with data science and AI engineering teams on researching and evaluating large language models for client applications.
Responsibilities:
- Assess client-specific data assets and determine the appropriate adaptation strategy — continued pretraining, supervised fine-tuning, or a combination — based on the domain, data volume, and use case requirements
- Curate, clean, structure, and prepare domain-specific datasets from raw client data for use in model training pipelines
- Fine-tune large language models in the 70B–100B+ parameter range using techniques such as LoRA, QLoRA, and multi-adapter patterns
- Perform continued pretraining on open-weight models (Qwen, Llama, and related ecosystems) to embed domain knowledge directly into model weights
- Manage distributed training workflows across multi-node GPU clusters
- Design and execute evaluation frameworks to validate domain adaptation quality, factual grounding, and model behavior
- Support RAG system development where applicable, including vector database integration, chunking strategies, and reranking pipelines
- Contribute to inference optimization and deployment pipeline integration