AI Infrastructure - Machine Learning Operations (MLOps) in Production

AI infrastructure and machine learning operations, or MLOps, are basically synonymous. Both terms denote the technology stack necessary to get machine learning algorithms into production in a stable, scalable and reliable way.

That stack extends from the data science tools used to select and train machine learning algorithms down to the hardware those algorithms run on and the data bases and message queues from which they draw the datasets that are their fuel.

AI infrastructure encompasses almost every stage of the machine learning workflow. It enables data scientist, data engineers, software engineers and DevOps teams to access and manage the computing resources to test, train and deploy AI algorithms.

Early in the workflow, that includes exploratory data analysis, running large-scale queries on data you’ve stored. In the middle, AI infrastructure involves training algorithms, probably on a cluster of distributed GPUs. And late in the workflow, it entails deploying those machine-learning models for inference in a reliabile and scalable way, much as you would deploy a web site on a web server.

Machine-Learning Tools and Platforms

Chris V. Nicholson

Chris V. Nicholson is a venture partner at Page One Ventures. He previously led Pathmind and Skymind. In a prior life, Chris spent a decade reporting on tech and finance for The New York Times, Businessweek and Bloomberg, among others.