Neo.Tax is automating R&D tax credits and software capitalization through innovative software solutions. They are seeking a Senior Data Scientist + Machine Learning Engineer to build and deploy machine learning models that enhance their core product and automate complex workflows.
Responsibilities:
- Own ML/AI problem spaces end-to-end: Define success metrics, create baselines, iterate on approaches, and drive projects from prototype to production
- Model development: Build and improve models spanning classification, information extraction, entity resolution, clustering, ranking, anomaly detection, and forecasting
- LLM systems: Design and evaluate prompt + retrieval + tool-calling pipelines; improve quality through datasets, labeling, and systematic evaluation
- Data foundations: Define datasets, labeling strategies, and data quality checks; build features that generalize across customer contexts
- Experimentation and evaluation: Design offline evaluations and online experiments; build dashboards and monitoring to detect regressions
- Production ML engineering: Build and operate training/inference pipelines (batch and/or online), model serving, feature/data pipelines, and monitoring/alerting for quality, latency, and cost
- Partner with engineering: Collaborate on productionization, scalability, reliability, latency, and cost; contribute directly to model-serving or batch pipelines as needed
- Cross-functional collaboration: Work with product, engineering, and customer-facing teams to understand workflows and translate real customer pain into ML deliverables
- Technical communication: Write clear specs and postmortems, document trade-offs, and communicate progress, risks, and decisions
Requirements:
- MS/PhD in Computer Science, Statistics, Mathematics, or a related quantitative field, or equivalent practical experience
- 6+ years of industry experience as a Data Scientist / Applied Scientist / ML Engineer shipping ML to production (or equivalent)
- Strong proficiency in Python and the modern data/ML ecosystem (NumPy/Pandas, scikit-learn, PyTorch or TensorFlow)
- Strong understanding of statistical modeling, experimentation, and evaluation (metrics, confidence intervals, A/B testing, bias/variance, error analysis)
- Experience building data pipelines and working with SQL and relational databases
- Experience deploying and maintaining models in production (batch or real-time), including monitoring and iteration; comfortable owning operational concerns (reliability, latency, cost)
- Ability to operate with high ownership in ambiguous environments; strong communication and collaboration skills
- Ability to effectively design and implement solutions without the help of AI
- Experience with LLM evaluation, synthetic data generation, RAG, or tool-augmented agents
- Experience with information extraction and document understanding
- Experience with distributed data processing (e.g., Spark, Beam) and/or workflow engines
- Experience with GCP, AWS, or Azure
- Experience working at early-stage, venture-backed startups