ProCogia is a data consulting firm helping businesses transform data into real growth. They are seeking an LLM Research Intern to collaborate with data science and AI engineering teams on researching and evaluating large language models for client applications.

Responsibilities:

Assess client-specific data assets and determine the appropriate adaptation strategy — continued pretraining, supervised fine-tuning, or a combination — based on the domain, data volume, and use case requirements
Curate, clean, structure, and prepare domain-specific datasets from raw client data for use in model training pipelines
Fine-tune large language models in the 70B–100B+ parameter range using techniques such as LoRA, QLoRA, and multi-adapter patterns
Perform continued pretraining on open-weight models (Qwen, Llama, and related ecosystems) to embed domain knowledge directly into model weights
Manage distributed training workflows across multi-node GPU clusters
Design and execute evaluation frameworks to validate domain adaptation quality, factual grounding, and model behavior
Support RAG system development where applicable, including vector database integration, chunking strategies, and reranking pipelines
Contribute to inference optimization and deployment pipeline integration

LLM Research Intern

Key skills

About this role

Responsibilities: