Build and maintain end-to-end ML pipelines for training, evaluation, and batch inference across use cases such as identity resolution, audience segmentation, and content affinity modeling.
Implement and experiment with supervised, unsupervised, and ranking models in Python (scikit-learn, XGBoost/LightGBM, PyTorch).
Engineer features from first-party viewership, engagement, subscription, and behavioral signals, guarding against data leakage, collinearity, and training/serving skew.
Run structured offline experiments; evaluate with the right metrics (precision/recall, F1, AUC-ROC, calibration, lift) and document findings in MLflow.
Develop and maintain data and feature pipelines on Databricks (PySpark, Delta, Workflows) that feed the feature store and model-training workflows, with attention to idempotency and reproducibility.
Write clean, tested, production-quality Python following engineering best practices (unit tests, code reviews, CI/CD).
Use MLflow for experiment tracking, model registration, and versioning under the guidance of senior engineers.
Support deployment and monitoring of batch inference jobs integrated with downstream activation platforms (e.g., Mosaic, FreeWheel, GAM) and data in Snowflake.
Use AI-assisted development tools (Cursor, GitHub Copilot, Amazon Q) to accelerate coding, debugging, and documentation under guidance.
Partner with Senior and Staff MLEs to understand system-design decisions and contribute meaningfully to technical discussions.
Work cross-functionally with Data Engineering, Feature Engineering, and Analytics to ensure data quality and pipeline reliability.
Document models, pipelines, and experiments clearly for team knowledge sharing.
Requirements
2–4 years of industry experience in machine learning, data science, or ML engineering (or 1–2 years with a relevant M.S.)
Strong Python proficiency; experience with pandas, NumPy, scikit-learn, and at least one deep-learning framework (PyTorch or TensorFlow)
Hands-on experience with Spark/PySpark or equivalent large-scale data processing.
Proficiency in SQL and familiarity with cloud data warehouses/lakehouses (Snowflake or Databricks)
Experience with experiment-tracking tools (MLflow, Weights & Biases, or similar)
Solid grasp of core ML concepts: classification, regression, ranking, embeddings, and model evaluation; plus strong CS fundamentals (data structures, algorithms, clean code)
Bachelor’s degree in Computer Science, Statistics, Engineering, or a related quantitative field (or equivalent practical experience)
Ability to use AI tools to independently improve productivity across the ML lifecycle, and clear written and verbal communication.