Summary:
Develop analytics and AI solutions to transform raw data into meaningful insights using statistics, machine learning, and visualization software, with a strong focus on LLMs and generative AI in a public-sector context.
Responsibilities
- Collect, process, and analyze structured and unstructured data using data mining, modeling, NLP, and ML techniques.
- Develop predictive models and algorithms and design automated data pipelines and workflows.
- Build dashboards, reports, and visualizations; collaborate with multiple teams to refine requirements.
- Implement AI governance and safety guardrails to reduce hallucinations, bias, and security risks (e.g., prompt injection).
- Develop LLM evaluation benchmarks with automated metrics and human-in-the-loop feedback.
- Identify parameter-efficient fine-tuning (PEFT/LoRA) opportunities for state government datasets.
Requirements
- Minimum 3 years of data science experience.
- Strong background in statistical analysis and ML; proficiency in SQL, Python, R, or similar.
- Experience with ML libraries/frameworks and methods such as regression, clustering, and classification.
- 2+ years hands-on with models such as GPT-4, Claude, Llama, Gemini or similar, and their APIs.
- Expert-level with orchestration tools such as LangChain, LlamaIndex, or Haystack.
- Experience with vector databases (Pinecone, Weaviate, Milvus, pgvector) and synthetic / instruction-tuning datasets.
- Preferred: Experience in regulated/public-sector environments with PII/PHI and ethical AI standards.
Education
- Master’s or PhD in computer science, statistics, mathematics, economics, or related field.
- Three years of equivalent related experience may substitute for a Bachelor’s degree.