Fusemachines is a global provider of enterprise AI products and services, on a mission to democratize AI. They are seeking a mid-to-senior Machine Learning Engineer / Data Scientist to build and deploy machine learning solutions that drive measurable business impact, working across the ML lifecycle and collaborating with stakeholders.
Responsibilities:
- Translate business questions into ML problem statements (classification, regression, time series forecasting, clustering, anomaly detection, recommendation, etc.)
- Collaborate with stakeholders to define success metrics, evaluation plans, and practical constraints (latency, interpretability, cost, data availability)
- Use SQL and Python to extract, join, and analyze data from relational databases and data warehouses
- Perform data profiling, missingness analysis, leakage checks, and exploratory analysis to guide modeling choices
- Build robust feature pipelines (aggregation, encoding, scaling, embeddings where appropriate) and document assumptions
- Train and tune supervised learning models for tabular data (e.g., logistic/linear models, tree-based methods, gradient boosting such as XGBoost/LightGBM/CatBoost, and neural nets for structured data)
- Apply strong tabular modeling practices: handling missing data, categorical encoding, leakage prevention, class imbalance strategies, calibration, and robust cross-validation
- Build time series models (statistical and ML/DL approaches) and validate with proper backtesting
- Apply clustering and segmentation techniques (k-means, hierarchical, DBSCAN, Gaussian mixtures) and evaluate stability and usefulness
- Apply statistics in practice (hypothesis testing, confidence intervals, sampling, experiment design) to support inference and decision-making
- Build and train deep learning models using PyTorch or TensorFlow/Keras
- Use best practices for training (regularization, calibration, class imbalance handling, reproducibility, sound train/val/test design)
- Choose appropriate metrics (AUC/F1/PR, RMSE/MAE/MAPE, calibration, lift, and business KPIs) and create evaluation reports
- Perform error analysis and interpretation (feature importance/SHAP, cohort slicing) and iterate based on evidence
- Package models for deployment (batch scoring pipelines or real-time APIs) and collaborate with engineers on integration
- Implement practical MLOps: versioning, reproducible training, automated evaluation, monitoring for drift/performance, and retraining plans
- Communicate tradeoffs and recommendations clearly to technical and non-technical stakeholders
- Create documentation and lightweight demos that make results actionable
Requirements:
- 3–8 years of experience in data science, machine learning engineering, or applied ML (mid-to-senior)
- Strong Python skills for data analysis and modeling (pandas/numpy/scikit-learn or equivalent)
- Strong SQL skills (joins, window functions, aggregation, performance awareness)
- Solid foundation in statistics (hypothesis testing, uncertainty, bias/variance, sampling) and practical experimentation mindset
- Hands-on experience across multiple model types, including: Classification & regression, Time series forecasting, Clustering/segmentation
- Experience with deep learning in PyTorch or TensorFlow/Keras
- Strong problem-solving skills: ability to work with ambiguous goals and messy data
- Clear communication skills and ability to translate analysis into decisions
- Experience with Databricks for applied ML (e.g., Spark, Delta Lake, MLflow, Databricks Jobs/Workflows)
- Experience deploying models to production (APIs, batch pipelines) and maintaining them over time (monitoring, retraining)
- Experience with orchestration tools (Airflow, Prefect, Dagster) and modern data stacks (Snowflake/BigQuery/Redshift/Databricks)
- Experience with cloud platforms (AWS/GCP/Azure/IBM) and containerization (Docker)
- Experience with responsible AI and governance best practices (privacy/PII handling, auditability, access controls)
- Consulting or client-facing delivery experience