All jobs
FusemachinesEngineering
Machine Learning Engineer / Data Scientist
RemotePosted today
Fusemachines is a global provider of enterprise AI products and services, on a mission to democratize AI, offering AI solutions and education worldwide.
Location: Remote
Responsibilities
- Translate business questions into ML problem statements (classification, regression, time series forecasting, clustering, anomaly detection, recommendation, etc.).
- Collaborate with stakeholders to define success metrics, evaluation plans, and practical constraints (latency, interpretability, cost, data availability).
- Use SQL and Python to extract, join, and analyze data from relational databases and data warehouses.
- Perform data profiling, missingness analysis, leakage checks, and exploratory analysis to guide modeling choices.
- Build robust feature pipelines (aggregation, encoding, scaling, embeddings where appropriate) and document assumptions.
- Train and tune supervised learning models for tabular data (e.g., logistic/linear models, tree-based methods, gradient boosting such as XGBoost/LightGBM/CatBoost, and neural nets for structured data).
- Apply strong tabular modeling practices: handling missing data, categorical encoding, leakage prevention, class imbalance strategies, calibration, and robust cross-validation.
- Build time series models (statistical and ML/DL approaches) and validate with proper backtesting.
- Apply clustering and segmentation techniques (k-means, hierarchical, DBSCAN, Gaussian mixtures) and evaluate stability and usefulness.
- Apply statistics in practice (hypothesis testing, confidence intervals, sampling, experiment design) to support inference and decision-making.
- Build and train deep learning models using PyTorch or TensorFlow/Keras.
- Use best practices for training (regularization, calibration, class imbalance handling, reproducibility, sound train/val/test design).
- Choose appropriate metrics (AUC/F1/PR, RMSE/MAE/MAPE, calibration, lift, and business KPIs) and create evaluation reports.
- Perform error analysis and interpretation (feature importance/SHAP, cohort slicing) and iterate based on evidence.
- Package models for deployment (batch scoring pipelines or real-time APIs) and collaborate with engineers on integration.
- Implement practical MLOps: versioning, reproducible training, automated evaluation, monitoring for drift/performance, and retraining plans.
- Communicate tradeoffs and recommendations clearly to technical and non-technical stakeholders.
- Create documentation and lightweight demos that make results actionable.
Requirements
- 3–8 years of experience in data science, machine learning engineering, or applied ML (mid-to-senior).
- Strong Python skills for data analysis and modeling (pandas/numpy/scikit-learn or equivalent).
- Strong SQL skills (joins, window functions, aggregation, performance awareness).
- Solid foundation in statistics (hypothesis testing, uncertainty, bias/variance, sampling) and practical experimentation mindset.
- Hands-on experience across multiple model types, including: classification & regression, time series forecasting, clustering/segmentation.
- Experience with deep learning in PyTorch or TensorFlow/Keras.
- Strong problem-solving skills: ability to work with ambiguous goals and messy data.
- Clear communication skills and ability to translate analysis into decisions.