What is MLOps and Why It Matters

In the previous posts, we learned what machine learning is and how to build a model. But building a model is only the beginning.

The hard part is everything that comes after - how do you put the model into an app? How do you retrain it when data changes? How do you know if it stops working? How do you manage ten models at once?

MLOps is the set of practices that answers these questions.

The Problem MLOps Solves

Imagine a data scientist builds a model in a Jupyter notebook. It predicts house prices with 95% accuracy. Great.

Now what?

The model lives on the data scientist's laptop
Nobody else can use it
It was trained on last year's data
There's no way to know if predictions are still accurate
If the data scientist leaves, the model is gone

This is the gap between "model works in a notebook" and "model works in production." MLOps fills that gap.

Building a model is like writing code. MLOps is like DevOps - it makes sure the code actually runs reliably in production.

MLOps = DevOps + Machine Learning

If you know DevOps, MLOps will feel familiar. It borrows the same ideas and applies them to ML.

DevOps
Code → Build → Test → Deploy → Monitor → Repeat

MLOps
Data → Train → Evaluate → Deploy → Monitor → Retrain → Repeat

The key difference - in DevOps, code doesn't degrade on its own. Once deployed, it does the same thing until you change it. In ML, models degrade over time because the real world changes. Customer behavior shifts, prices change, new patterns appear. A model trained on last year's data might give wrong predictions today.

This is why MLOps has a retraining loop that DevOps doesn't need.

The MLOps Lifecycle

Here's what a complete ML workflow looks like in production:

1. Data Management
   Collect → Clean → Version → Store
       ↓
2. Model Development
   Experiment → Train → Evaluate → Compare
       ↓
3. Model Deployment
   Package → Test → Deploy → Serve
       ↓
4. Monitoring
   Track predictions → Detect drift → Alert
       ↓
5. Retraining
   New data → Retrain → Evaluate → Redeploy
       ↓
   (Back to step 1)

Each step has its own challenges. Let's look at them.

1. Data Management

Models are only as good as their data. In production, data management means:

Data versioning
Just like you version code with Git, you need to version your data. If you retrain a model and it gets worse, you need to know what data it was trained on. Tools like DVC (Data Version Control) handle this.

Data quality
Bad data in production is worse than bad code. Missing values, wrong labels, or stale data silently corrupt your model. You need automated checks that validate data before training.

Data pipelines
Data doesn't just appear. It flows from databases, APIs, event streams, and files. A data pipeline automates the collection, cleaning, and transformation. If the pipeline breaks, training stops.

Version your data the same way you version your code. You'll need it when debugging why a model went wrong.

2. Model Development

This is where data scientists spend most of their time, but MLOps adds structure to the process.

Experiment tracking
Every training run produces results - accuracy, loss, hyperparameters, the dataset used. Without tracking, you can't compare runs or reproduce results. Tools like MLflow or Weights & Biases log every experiment automatically.

Reproducibility
If you can't reproduce a result, you can't trust it. This means pinning library versions, saving random seeds, and logging the exact dataset and parameters used for each run.

Model registry
Once a model is good enough, it goes into a registry - a central place where all production-ready models are stored with their version, metadata, and lineage. Think of it like a container registry (GHCR, ACR) but for models.

3. Model Deployment

This is where the model goes from a file on disk to a service that handles real requests.

Serving patterns
There are two main ways to serve a model:

Real-time (online) - The model runs as an API. You send a request, get a prediction back immediately. Used for recommendations, fraud detection, search ranking.
Batch - The model processes a large dataset at once, usually on a schedule. Used for daily reports, email campaigns, bulk scoring.

Packaging
A model isn't just a .pkl file. It needs its dependencies, preprocessing code, and configuration. Containerizing the model (Docker) makes deployment consistent across environments.

Deployment strategies
You don't swap models instantly. You use the same strategies as regular deployments:

Canary - Route 5% of traffic to the new model, watch metrics, then gradually increase
Shadow - Run the new model alongside the old one, compare predictions without affecting users
Blue-green - Switch all traffic at once, with easy rollback

4. Monitoring

This is where MLOps differs from DevOps the most.

In regular software, you monitor uptime, latency, and error rates. In ML, you also monitor the model's predictions.

Data drift
The input data starts looking different from what the model was trained on. Customer demographics shift, product catalogs change, seasonal patterns appear. The model hasn't changed, but its accuracy drops because the world changed.

Model drift (concept drift)
The relationship between inputs and outputs changes. What used to predict a successful sale no longer does. The patterns the model learned are now wrong.

Prediction monitoring
Track what the model is predicting. If a fraud detection model suddenly flags 40% of transactions as fraud (up from 2%), something is wrong - either the data changed or the model broke.

A model that worked last month might not work today. Monitoring catches this before users notice.

5. Retraining

When monitoring detects drift or degraded performance, the model needs to be retrained.

Scheduled retraining
Retrain on a fixed schedule (daily, weekly, monthly) regardless of whether performance dropped. Simple to implement but wasteful if the model is still fine.

Triggered retraining
Retrain only when monitoring detects a problem - accuracy drops below a threshold, data drift exceeds a limit. More efficient, but requires good monitoring.

Automated pipelines
The entire flow - data preparation, training, evaluation, deployment - should be automated. A retraining pipeline that requires manual steps will be skipped when the team is busy.

How MLOps Maturity Looks

Not every team starts with full automation. MLOps maturity typically grows in stages:

Level 0 - Manual
Everything is manual. Data scientist trains in a notebook, exports a model file, hands it to engineering to deploy. No automation, no monitoring.

Level 1 - Pipeline automation
Training is automated. A pipeline collects data, trains the model, evaluates it, and produces a deployable artifact. Deployment might still be manual.

Level 2 - CI/CD for ML
The full loop is automated. Code changes, data changes, or drift alerts trigger retraining. New models are tested and deployed automatically. Monitoring feeds back into retraining.

Most teams are at Level 0 or Level 1. Level 2 is the goal, but it requires investment in tooling and process.

MLOps vs DevOps - Key Differences

What you version
DevOps versions code. MLOps versions code, data, models, and experiments.

What degrades
In DevOps, software does the same thing until changed. In MLOps, models degrade silently as the world changes.

What you test
DevOps tests code behavior (unit tests, integration tests). MLOps also tests data quality, model accuracy, and prediction distribution.

What you monitor
DevOps monitors infrastructure metrics. MLOps also monitors data drift, model performance, and prediction quality.

The MLOps Tooling Landscape

You don't need all of these, but knowing the categories helps:

Experiment tracking - MLflow, Weights & Biases, Neptune
Data versioning - DVC, LakeFS
Feature stores - Feast, Tecton
Model registry - MLflow Model Registry, Azure ML
Model serving - TensorFlow Serving, Triton, BentoML, KServe
Pipeline orchestration - Kubeflow, Airflow, Prefect
Monitoring - Evidently, WhyLabs, Arize

Start small. MLflow for experiment tracking and a simple CI/CD pipeline for deployment covers a lot of ground.

Key Takeaways

MLOps is DevOps for ML - It covers the entire lifecycle from data to deployment to monitoring
Models degrade over time - Unlike code, models need retraining because the real world changes
Version everything - Code, data, models, and experiments
Monitoring is critical - Data drift and model drift can silently break predictions
Start simple - Manual first, then automate the most painful parts
The model is the easy part - The infrastructure, pipelines, and monitoring around it are the real work

Building a model takes days. Keeping it running reliably in production takes MLOps.