Machine Learning Workflows

Machine learning in production happens in five phases. (There are few standardized best practices across teams and companies in the industry. Most machine-learning systems are ad hoc.)

Phases in Machine Learning Workflows

  • Use Case Conception and Formulation
  • Feasibility Study and Exploratory Analysis
  • Model Design, Training, and Offline Evaluation
  • Model Deployment, Online Evaluation, and Monitoring
  • Model Maintenance, Diagnosis, and Retraining

Within each phase, we’ll explain:

  • What specific tasks are performed?
  • Who is involved in each phase (businessperson, data scientist/engineer, DevOps)?

Relevant Personnel and Roles

  • Decision Maker: holds the purse strings, can wrangle funding and resources (might be same as Stakeholder).
  • Stakeholder: businessperson who cares about problem, who can state/quantify the business value of potential solutions.
  • Domain Expert: person who understands the domain and problem, may also know about the data (might be the same as Stakeholder).
  • Data Scientist: ML expert who can turn business problem into a well-defined ML task and propose one or more possible approaches.
  • Data Engineer: database admin (DBA) or similar who knows where data lives, can comment on its size and contents (might be same as Data Scientist).
  • Systems Architect / DevOps: systems architect or similar who is expert on big data and production software infrastructure, deployment, etc.

Learn to build AI in Simulations »

Phase 1: Use Case Conception and Formulation

Goal Identify a data-intensive business problem and propose a potential machine learning solution.


  • Identify use case, define business value (labor/cost savings, fraud prevention and reduction, increased clickthrough rate, etc.)
  • Re-state business problem as machine learning task, e.g., anomaly detection or classification
  • Define “success” – choose metric, e.g., AUC, and minimum acceptable performance, quantify potential business value
  • Identify relevant and necessary data and available data sources
  • Quick and dirty literature review
  • Define necessary system architecture
  • Assess potential sources of risk
  • Commission exploratory analysis or feasibility study, if appropriate


  • Critical: Decision Maker, Stakeholder, Data Scientist
  • Other: Domain Expert (if Stakeholder doesn’t know problem), Data Engineer (if DS doesn’t know data systems), Systems Architect (if discussing deployment)

Phase 2 (optional): Feasibility Study and Exploratory Analysis

Goal Rapidly explore and de-risk a use case before significant engineering resources are dedicated to it, make “go/no go” recommendation

NOTES: Akin to feasibility studies and short-term (1-2 month) POCs.

Overlaps with Phase 3 (model training) except that here you don’t expect a fully tuned model, nor do you expect to produce a reusable software artifact.


  • Exploratory data analysis (EDA): descriptive statistics, visualization, detection of garbage data/noise/outlier values, quantify signal-to-noise ratio
  • Quantify suitability of data for ML: number of records and features, availability and quality of labels,
  • Specify experimental (i.e. training/test split) protocol
  • Rapid data ETL (extract, transform, and load) and vectorization to build experimental data sets (which might be only a toy subset)
  • Thorough literature review with short list of proposed machine-learning approaches
  • Train and evaluate ML models to assess presence (or absence) of predictive signal
  • Make “go/no go” recommendation


  • Data Engineer, Data Scientist: explore data, run experiments, produce reports
  • Stakeholder, Domain Expert: answer questions, as needed
  • Decision Maker, Stakeholder: consume final report/recommendation

SKIL Support: ETL, simple EDA, model training, and evaluation are supported by Workspaces/Experiments/notebooks

Learn to Build AI in Simulations >>

Phase 3: Model Design, Training, and Offline Evaluation


  • Train best performing model possible given available data, computational resources, and time.
  • Build reliable, reusable software pipeline for re-training models in the future.

NOTES: Overlaps with Phase 2 (feasibility study), but here you expect a fully tuned model and a reusable software artifact.


  • Plan full set of experiments
  • Data ETL and vectorization pipeline that is configurable, fully tested, scalable, automatable
  • Model training code that is configurable, fully tested, scalable, automatable
  • “Offline” (on held-out, not live, data) model evaluation code that is configurable, fully tested, scalable, automatable
  • Design, train, and evaluate models
  • Tune and debug model training
  • Thorough empirical comparison of competing models, hyperparameters
  • Document experiments and model performance to date
  • Save deployable artifacts (transforms, models, etc.)


  • Data Engineer: ETL, assist DS with infrastructure as needed
  • Data Scientist: plan and execute model training and evaluation, produce “reports” (automated by tools)
  • Stakeholder, Domain Expert: answer questions, as needed; consume “reports” on progress/performance; provide feedback
  • Decision Maker, Stakeholder: consume “reports” on progress/performance

Learn to Build AI in Simulations >>

Phase 4: Model Deployment, Online Evaluation, and Monitoring


  • Deploy trained model (and transform, if needed) as service, integrate with other software/processes
  • Monitor and log deployed model status, performance, and accuracy


  • Deploy models (and transforms) as consumable software services via, e.g., REST API
  • Plan and execute trial deployments and experiments, e.g.,
  • Deploy to controlled staging environment, measure performance and accuracy on live data but don’t expose
  • Set up and manage A/B tests to compare, e.g., new vs. old models
  • Log and detect errors in deployment, e.g.: -Transform fails because schema does not match live data -Model fails due to invalid vectorized data input size -Transform or model servers die or become unreachable
  • Log and track model performance and accuracy on live data, look for: -Poor prediction throughput (might need to add more servers) -Model drift, i.e., gradual decline in accuracy (might need to retrain model on more recent data) -Unexpected poor accuracy (might need to roll back model)


  • “Gatekeeper:” someone or some group of people should be responsible for “blessing” models, i.e., deciding a model should go live
  • System Architect: deploy models, manage monitor model status and performance
  • Data Scientist: plan A/B tests (or other trial deployments), consume reports on model accuracy
  • Stakeholder, Domain Expert: answer questions, as needed; consume reports on model accuracy, provide feedback

SKIL Support: one-click deployment of trained or imported models, simple monitoring of model status

Phase 5: Model Maintenance, Diagnosis, and Retraining


  • Monitor and log deployed model accuracy over longer periods of time
  • Gather statistics on deployed models to feed back into training and deployment


  • Gather statistics on deployed models, such as how long it takes for deployed models to become “stale” (i.e., accuracy on live data drops below acceptable threshold); Patterns in model inaccuracies (might need to re-design model architecture to account for new feature or to correct faulty assumption)
  • Formulate new hypotheses or experiments based on insights from tracking performance


  • System Architect: monitor model status and performance
  • Data Scientist: consume reports on model accuracy
  • Stakeholder, Domain Expert: answer questions, as needed; consume reports on model accuracy, provide feedback

Chris V. Nicholson

Chris V. Nicholson is a venture partner at Page One Ventures. He previously led Pathmind and Skymind. In a prior life, Chris spent a decade reporting on tech and finance for The New York Times, Businessweek and Bloomberg, among others.