Machine Learning Workflows

Machine learning in production happens in five phases. (There are few standardized best practices across teams and companies in the industry. Most machine-learning systems are ad hoc.)

Phases in Machine Learning Workflows

Use Case Conception and Formulation
Feasibility Study and Exploratory Analysis
Model Design, Training, and Offline Evaluation
Model Deployment, Online Evaluation, and Monitoring
Model Maintenance, Diagnosis, and Retraining

Within each phase, we’ll explain:

What specific tasks are performed?
Who is involved in each phase (businessperson, data scientist/engineer, DevOps)?

Relevant Personnel and Roles

Decision Maker: holds the purse strings, can wrangle funding and resources (might be same as Stakeholder).
Stakeholder: businessperson who cares about problem, who can state/quantify the business value of potential solutions.
Domain Expert: person who understands the domain and problem, may also know about the data (might be the same as Stakeholder).
Data Scientist: ML expert who can turn business problem into a well-defined ML task and propose one or more possible approaches.
Data Engineer: database admin (DBA) or similar who knows where data lives, can comment on its size and contents (might be same as Data Scientist).
Systems Architect / DevOps: systems architect or similar who is expert on big data and production software infrastructure, deployment, etc.

Learn to build AI in Simulations »

Phase 1: Use Case Conception and Formulation

Goal Identify a data-intensive business problem and propose a potential machine learning solution.

Tasks

Identify use case, define business value (labor/cost savings, fraud prevention and reduction, increased clickthrough rate, etc.)
Re-state business problem as machine learning task, e.g., anomaly detection or classification
Define “success” – choose metric, e.g., AUC, and minimum acceptable performance, quantify potential business value
Identify relevant and necessary data and available data sources
Quick and dirty literature review
Define necessary system architecture
Assess potential sources of risk
Commission exploratory analysis or feasibility study, if appropriate

People

Critical: Decision Maker, Stakeholder, Data Scientist
Other: Domain Expert (if Stakeholder doesn’t know problem), Data Engineer (if DS doesn’t know data systems), Systems Architect (if discussing deployment)

Phase 2 (optional): Feasibility Study and Exploratory Analysis

Goal Rapidly explore and de-risk a use case before significant engineering resources are dedicated to it, make “go/no go” recommendation

NOTES: Akin to feasibility studies and short-term (1-2 month) POCs.

Overlaps with Phase 3 (model training) except that here you don’t expect a fully tuned model, nor do you expect to produce a reusable software artifact.

Tasks

Exploratory data analysis (EDA): descriptive statistics, visualization, detection of garbage data/noise/outlier values, quantify signal-to-noise ratio
Quantify suitability of data for ML: number of records and features, availability and quality of labels,
Specify experimental (i.e. training/test split) protocol
Rapid data ETL (extract, transform, and load) and vectorization to build experimental data sets (which might be only a toy subset)
Thorough literature review with short list of proposed machine-learning approaches
Train and evaluate ML models to assess presence (or absence) of predictive signal
Make “go/no go” recommendation

People

Data Engineer, Data Scientist: explore data, run experiments, produce reports
Stakeholder, Domain Expert: answer questions, as needed
Decision Maker, Stakeholder: consume final report/recommendation

SKIL Support: ETL, simple EDA, model training, and evaluation are supported by Workspaces/Experiments/notebooks

Learn to Build AI in Simulations >>

Phase 3: Model Design, Training, and Offline Evaluation

Goals

Train best performing model possible given available data, computational resources, and time.
Build reliable, reusable software pipeline for re-training models in the future.

NOTES: Overlaps with Phase 2 (feasibility study), but here you expect a fully tuned model and a reusable software artifact.

Tasks

Plan full set of experiments
Data ETL and vectorization pipeline that is configurable, fully tested, scalable, automatable
Model training code that is configurable, fully tested, scalable, automatable
“Offline” (on held-out, not live, data) model evaluation code that is configurable, fully tested, scalable, automatable
Design, train, and evaluate models
Tune and debug model training
Thorough empirical comparison of competing models, hyperparameters
Document experiments and model performance to date
Save deployable artifacts (transforms, models, etc.)

People

Data Engineer: ETL, assist DS with infrastructure as needed
Data Scientist: plan and execute model training and evaluation, produce “reports” (automated by tools)
Stakeholder, Domain Expert: answer questions, as needed; consume “reports” on progress/performance; provide feedback
Decision Maker, Stakeholder: consume “reports” on progress/performance

Learn to Build AI in Simulations >>

Phase 4: Model Deployment, Online Evaluation, and Monitoring

Goals

Deploy trained model (and transform, if needed) as service, integrate with other software/processes
Monitor and log deployed model status, performance, and accuracy

Tasks

Deploy models (and transforms) as consumable software services via, e.g., REST API
Plan and execute trial deployments and experiments, e.g.,
Deploy to controlled staging environment, measure performance and accuracy on live data but don’t expose
Set up and manage A/B tests to compare, e.g., new vs. old models
Log and detect errors in deployment, e.g.: -Transform fails because schema does not match live data -Model fails due to invalid vectorized data input size -Transform or model servers die or become unreachable
Log and track model performance and accuracy on live data, look for: -Poor prediction throughput (might need to add more servers) -Model drift, i.e., gradual decline in accuracy (might need to retrain model on more recent data) -Unexpected poor accuracy (might need to roll back model)

People

“Gatekeeper:” someone or some group of people should be responsible for “blessing” models, i.e., deciding a model should go live
System Architect: deploy models, manage monitor model status and performance
Data Scientist: plan A/B tests (or other trial deployments), consume reports on model accuracy
Stakeholder, Domain Expert: answer questions, as needed; consume reports on model accuracy, provide feedback

SKIL Support: one-click deployment of trained or imported models, simple monitoring of model status

Phase 5: Model Maintenance, Diagnosis, and Retraining

Goals

Monitor and log deployed model accuracy over longer periods of time
Gather statistics on deployed models to feed back into training and deployment

Tasks

Gather statistics on deployed models, such as how long it takes for deployed models to become “stale” (i.e., accuracy on live data drops below acceptable threshold); Patterns in model inaccuracies (might need to re-design model architecture to account for new feature or to correct faulty assumption)
Formulate new hypotheses or experiments based on insights from tracking performance

People

System Architect: monitor model status and performance
Data Scientist: consume reports on model accuracy
Stakeholder, Domain Expert: answer questions, as needed; consume reports on model accuracy, provide feedback

Chris V. Nicholson

Chris V. Nicholson is a venture partner at Page One Ventures. He previously led Pathmind and Skymind. In a prior life, Chris spent a decade reporting on tech and finance for The New York Times, Businessweek and Bloomberg, among others.

A.I. Wiki

A Beginner’s Guide to Important Topics in AI, Machine Learning, and Deep Learning.

Machine Learning Workflows

Phases in Machine Learning Workflows

Relevant Personnel and Roles

Phase 1: Use Case Conception and Formulation

Phase 2 (optional): Feasibility Study and Exploratory Analysis

Phase 3: Model Design, Training, and Offline Evaluation

Phase 4: Model Deployment, Online Evaluation, and Monitoring

Phase 5: Model Maintenance, Diagnosis, and Retraining

Chris V. Nicholson