Why do we need MLOps?

Kiran Karkera
2 min readJul 27, 2022
Oversimplified ML workflow

MLOps (Machine Learning OPerationS) platforms are all the rage these days. On encountering them, a newcomer to the MLecosystem may think aloud:

There’s a lot of stuff here, do I really need all of this? All I want to do is train models and get the best scores.

That’s a reasonable response to the complexity of the MLOps tooling. The rest of this post aims to tease out why we need the additional complexity, and also learn when we can get away with a stripped down toolset.

ML pipelines after a few days

A few weeks/months into development, lets say we have done multiple iterations of modeling and have 100 models as the outcome of the process. We would find that

  • There could be multiple data pipelines, each with a unique set of features that are developed to improve model performance.
  • Multiple models exist, possibly with different architectures, and each one with unique scores on the desired metric.

Therefore we would have multiple versions of

  • Raw datasets
  • Feature engineering code
  • Model architectures

and certainly other artifacts, however I’ll stop there in order to simplify the discussion.

git branches

The tools we need at this point should be driven by the ‘knobs’ that we would like to tune.

Tools in the MLOps ecosystem. Note that all the tools have AWS/GCP/Azure equivalents, only a few are explicitly mentioned.

For example

An ML workflow that operates

  • With a fixed or static dataset
  • Does not change model architecture, just the model hyper-parameters
  • Does not save trained models as artefacts (which is likely when the training cost is low)

needs to use just Git, Docker and an Experiment tracker in order to operationalize a production ML workflow.

On the other hand, an ML workflow that operates on

  • A stream of data (and therefore is dynamic)
  • Changes model architecture
  • Saves model artefacts as the models are expensive to (re)train.

would need to use most of the tools in the diagram above.

In a later post, I’ll do a walk-through of a use-case that has only a few requirements, and use the minimum set of tools to operationalize it.

--

--