End-to-End MLOps Pipeline: A Comprehensive Project

Last Updated : 21 Apr, 2025

Machine Learning Operations (MLOps) is a set of practices that aims to deploy and maintain machine learning models in production reliably and efficiently. It combines the principles of DevOps with machine learning to streamline the process of taking ML models from development to production. This article will provide a comprehensive guide to building an end-to-end MLOps pipeline.

Table of Content

Introduction to MLOps
Building an End-to-End MLOps Pipeline: A Practical Guide

1. Our Objectives Are to
2. Problem Statement
3. Description of the Dataset
4. Staring With Preprocessing the Data
5. Model Selection and Training
6. Hyperparameter Tuning
7. Model Evaluation
8. Visualization and Results Analysis
9. Continuous Integration with CML
10. Model Deployment with FastAPI
11. Dockerization

Introduction to MLOps

MLOps bridges the gap between machine learning model development and its operationalization. It ensures that models are scalable, maintainable, and deliver value consistently. The primary goal of MLOps is to automate the machine learning lifecycle, integrating with existing CI/CD frameworks to enable continuous delivery of ML-driven applications.

It's a set of practices and tools that streamline the journey from model development to deployment, addressing key challenges such as:

Ensuring reproducibility in data preprocessing and model training.
Managing model versions effectively.
Deploying models efficiently and safely.
Monitoring model performance in production environments.

Building an End-to-End MLOps Pipeline: A Practical Guide

In this project, we're going to build an end-to-end MLOps pipeline, demonstrating how these practices work in real-world scenarios.

1. Our Objectives Are to

Identify a problem statement and gather relevant data
Preprocess the data and develop a robust machine-learning model through hyperparameter tuning
Implement version control for both data and model using Git and DVC
Utilize MLflow for model registration
Develop CI/CD workflows for model reports
Create an Interface to access the trained Model using FASTAPI
And Lastly, Dockerize the project

This is an flow of project to get an overview:

End-to-End-MLOps-Pipeline — End-to-End MLOps Pipeline

2. Problem Statement

The goal of this project is to predict the academic risk of students in higher education. This problem statement is derived from an active competition on Kaggle, providing a real-world context for our MLOps implementation.

3. Description of the Dataset

Let's start with a description of our data, as it forms the foundation of any machine learning project.

The dataset originated from a higher education institution and was compiled from several disjoint databases. It contains information about students enrolled in various undergraduate programs, including agronomy, design, education, nursing, journalism, management, social service, and technologies. The data encompasses:

Information known at the time of student enrollment (academic path, demographics, and socio-economic factors)
Students' academic performance at the end of the first and second semesters

The dataset is structured and labeled, with most columns being label-encoded. The target variable is formulated as a three-category classification task:

Dropout
Enrolled
Graduate

This classification is determined at the end of the normal duration of the course.

For a more detailed description of the dataset attributes, please refer to

Predict Students' Dropout and Academic Success

Initial Data Exploration and Insights: The dataset comprises 76,518 rows and 38 columns. All attributes are of integer or float data types, except for the target variable, which is an object type.

Key observations:

The target variable is imbalanced:

Graduate: 36,282 rows
Enrolled: 14,940 rows
Dropout: (remaining rows)

Other fields also show imbalances, as revealed by univariate analysis

We will initially work with this imbalanced dataset and address the balance issue in later stages of our pipeline. In the next section, we'll dive into our data preprocessing steps and begin building our MLOps pipeline.

4. Staring With Preprocessing the Data

After our initial exploration, we moved on to preparing our data for modeling. Here's a detailed look at our preprocessing steps:

Handling Missing Values Fortunately, our dataset didn't contain any missing values, which simplified our preprocessing pipeline.
Feature Selection We removed the 'id' column as it doesn't contribute to our predictive model:

Python

X = df.drop(columns=[TARGET, 'id'])
y = df[TARGET

Feature Encoding We applied different encoding techniques based on the nature of our features:

1. One-Hot Encoding We used one-hot encoding for the 'Course' column to convert categorical course names into numerical column features to able to understand by machine:

Python

course_column = ['Course']

2. Label Encoding For our target variable, we applied label encoding:

Python

le = LabelEncoder()
y_encoded = le.fit_transform(y)

Feature Scaling We standardized all numerical columns using StandardScaler:
Preprocessing Pipeline We created a preprocessing pipeline using sklearn's ColumnTransformer to ensure consistent application of our preprocessing steps:

Python

preprocessor = ColumnTransformer(
    transformers=[
        ('num', StandardScaler(), numerical_columns),
        ('course', OneHotEncoder(handle_unknown='ignore'), course_column)
    ],
    remainder='passthrough'
)

X_preprocessed = preprocessor.fit_transform(X)

This pipeline standardizes numerical features, one-hot encodes the 'Course' column, and passes through the remaining categorical columns.

By creating this preprocessing pipeline, we ensure that all our transformations are applied consistently across training and test sets, and can be easily reproduced in production environments. This is a crucial aspect of MLOps, as it maintains consistency between model development and deployment stages.

5. Model Selection and Training

After preprocessing our data, we moved on to the crucial steps of model selection and training. Our approach involves training multiple models to compare their performance and select the best one for our task.

Data Splitting: We begin by splitting our preprocessed data into training and testing sets. To ensure reproducibility, we use parameters defined in our params.yaml file:

Python

X_train, y_train = make_X_y(dataframe=train_data, target_column=TARGET)

This function reads the random state and split ratio from our configuration file, allowing us to easily adjust these parameters without changing our code.

1. Model Selection

Now created a comprehensive list of models to evaluate for our classification task. These models are defined in our modelslist.py file for easy access and modification:

Each model is initialized with parameters specified in our params.yaml file, allowing for easy hyperparameter tuning:

Python

models = {
    'RandomForest': RandomForestClassifier(**params.get('train_model', {}).get('random_forest', {})),
    'LogisticRegression': LogisticRegression(**params.get('train_model', {}).get('logistic_regression', {})),
    'SVC': SVC(**params.get('train_model', {}).get('svc', {})),
    'DecisionTree': DecisionTreeClassifier(**params.get('train_model', {}).get('decision_tree', {})),
    'GradientBoosting': GradientBoostingClassifier(**params.get('train_model', {}).get('gradient_boosting', {})),
    'AdaBoost': AdaBoostClassifier(**params.get('train_model', {}).get('adaboost', {})),
    'KNN': KNeighborsClassifier(**params.get('train_model', {}).get('knn', {})),
    'GaussianNB': GaussianNB(**params.get('train_model', {}).get('gaussian_nb', {}))
}

2. Model Training and Evaluation

We then iterate through our list of models, training each one on our preprocessed data:

Python

logging.info(f'{model_name} is training...')
trained_model = train_model(model=model, X_train=X_train, y_train=y_train)

After training, we immediately evaluate each model's performance:

Python

accuracy, f1, precision, recall = evaluate_model(model=trained_model, X_test=X_train, y_test=y_train)

We calculate key metrics including accuracy, F1 score, precision, and recall. These metrics give us a comprehensive view of each model's performance, allowing us to make an informed decision about which model to select for deployment.

3. Model Saving

Finally, we save each trained model for future use:

Python

save_model(model=trained_model, save_path=model_output_path_)

This step is crucial in our MLOps pipeline, as it allows us to version our models and easily deploy or rollback as needed.

By systematically training and evaluating multiple models, we can identify the best performing model for our specific task of predicting academic risk. In the next section, we'll dive deeper into our model evaluation results and discuss how we select the final model for deployment.

6. Hyperparameter Tuning

After our initial model training, we move on to one of the most crucial steps in machine learning: hyperparameter tuning. This process helps us optimize our models' performance by finding the best combination of hyperparameters.

1. Setting Up MLflow for Experiment Tracking

Lets begin by setting up MLflow, a powerful tool for tracking our experiments:

Python

def setup_mlflow():
    start_mlflow_server()
    mlflow.set_tracking_uri("http://localhost:5000")
    experiment_name = "Hyperparameter Tuning"

MLflow allows us to log our hyperparameters, metrics, and models, making it easy to compare different runs and reproduce our results.

2. Models and Hyperparameters

We focus on tuning two most accuracy models get from above training:

Random Forest Classifier
Gradient Boosting Classifier

For each model, we define a set of hyperparameters to tune:

Python

models_to_tune = {
    'RandomForest': (RandomForestClassifier(), {
        'n_estimators': [500],
        'max_depth': [10, None],
        'min_samples_split': [10],
        'min_samples_leaf': [1],
        'max_features': ['sqrt']
    }),
    'GradientBoosting': (GradientBoostingClassifier(), {
        'n_estimators': [400],
        'learning_rate': [0.1],
        'max_depth': [4],
        'min_samples_split': [2],
        'min_samples_leaf': [1],
        'max_features': ['sqrt']
    })
}

3. Hyperparameter Tuning Process

We use RandomizedSearchCV for our hyperparameter tuning, which randomly samples from the parameter space:

Python

def hyperparameter_tuning(model, param_dist, X_train, y_train, X_val, y_val, n_iter=100, cv=5):
    random_search = RandomizedSearchCV(model, param_distributions=param_dist, n_iter=n_iter, cv=cv, 
                                       scoring='f1_weighted', n_jobs=-1, verbose=2, random_state=9)
    
    random_search.fit(X_train, y_train)

We save the best model for each type:

Python

mlflow.sklearn.log_model(best_model, f"{model_name}_best", signature=signature)

4. Selecting the Best Models

After tuning, we select the two best-performing models based on their F1 scores:

Python

tuned_model_results.sort(key=lambda x: x[1], reverse=True)
best_tuned_model1 = tuned_model_results[0][0] if len(tuned_model_results) > 0 else None
best_tuned_model2 = tuned_model_results[1][0] if len(tuned_model_results) > 1 else None

These top two models are then saved for further use in our pipeline.

By implementing this rigorous hyperparameter tuning process, we ensure that our models are optimized for our specific task of predicting academic risk. The use of MLflow for experiment tracking allows us to easily compare different runs and select the best-performing models.

7. Model Evaluation

After hyperparameter tuning, we move on to the crucial step of evaluating our best model and generating predictions for the test set. This process ensures that our model performs well on unseen data and prepares us for submission.

1. Loading the Best Model

We start by loading our best-tuned model, which was selected based on its performance during hyperparameter tuning:

Python

model_name = best_tuned_model1 + "_tuned.joblib"
model_path = root_path / 'models' / 'tuned_models' / model_name
model = joblib.load(model_path)

We also load the Preprocessor.joblib used during preprocessing to ensure consistent column transformation:

Python

preprocessor_path= root_path / 'models' / 'transformers' /’preprocessor.joblib'
preprocessor= joblib.load(preprocessor_path)

2. Evaluation on Validation Set

We evaluate our model on the validation set to get a final assessment of its performance:

Python

def evaluate_and_log(model, X, y, dataset_name):
    y_pred = get_predictions(model, X)
    accuracy, f1, precision, recall = calculate_metrics(y, y_pred)
    
    logging.info(f'\nMetrics for {dataset_name} dataset:')
    logging.info(f'Accuracy: {accuracy:.4f}')
    logging.info(f'F1 Score: {f1:.4f}')
    logging.info(f'Precision: {precision:.4f}')
    logging.info(f'Recall: {recall:.4f}')


# Evaluate on validation set
val_data = load_dataframe(val_path)
X_val, y_val = make_X_y(val_data, TARGET)
evaluate_and_log(model, X_val, y_val, "Validation")

This step provides us with key performance metrics (accuracy, F1 score, precision, and recall) on our validation set, giving us confidence in our model's generalization ability.

By following this structured approach to model evaluation and prediction, we ensure that our MLOps pipeline not only produces a well-tuned model but also generates reliable predictions for real-world use. The logging of performance metrics and prediction on validation set are key steps in maintaining transparency and reproducibility in our machine learning workflow.

8. Visualization and Results Analysis

After model evaluation and prediction, it's crucial to visualize our results to gain deeper insights into our model's performance and the dataset characteristics. We've created several informative visualizations to help us understand our model better.

Setting Up: We start by loading our validation data, test predictions, and the best-tuned model.

Confusion Matrix

We visualize the confusion matrix to understand our model's performance across different classes:

Python

def plot_confusion_matrix(y_true, y_pred, model_name, plot_dir):
    cm = confusion_matrix(y_true, y_pred)
    plt.figure(figsize=(12, 10))
    sns.heatmap(cm, annot=True, fmt='d', cmap='Blues')
    plt.title(f'Confusion Matrix - {model_name}')
    plt.ylabel('Actual')
    plt.xlabel('Predicted')
    plt.tight_layout()
    plt.savefig(plot_dir / f'{model_name}_confusion_matrix.png')
    plt.close()

Output:

GradientBoosting_tuned_confusion_matrix — Confusion Matix of best model

This visualization helps us identify class 3 our model predicts well and where it tends to make mistakes.

Feature Importance

For models that support it, we plot feature importance to understand which features are most influential in our predictions:

Python

def plot_feature_importance(model, X, model_name, plot_dir):
    if hasattr(model, 'feature_importances_'):
        importances = model.feature_importances_
        feature_importance = pd.DataFrame({'feature': X.columns, 'importance': importances})
        feature_importance = feature_importance.sort_values('importance', ascending=False)


        plt.figure(figsize=(12, 10))
        sns.barplot(x='importance', y='feature', data=feature_importance.head(20))
        plt.title(f'Top 20 Feature Importance - {model_name}')
        plt.tight_layout()
        plt.savefig(plot_dir / f'{model_name}_feature_importance.png')
        plt.close()

Output:

GradientBoosting_tuned_feature_importance-(1) — Top Features of model

This plot helps us identify ‘Curricular units 2nd sem(approved) features are driving our model's decisions, which can be valuable for feature selection and model interpretation.

9. Continuous Integration with CML

In our MLOps pipeline, Continuous Integration (CI) plays a crucial role in automating the process of model training, evaluation, and reporting. We use GitHub Actions along with CML (Continuous Machine Learning) to achieve this. Here's how our CI pipeline works:

XML

name: CI using CML


on: 
    push


jobs:
    build:
        name: build
        runs-on: ubuntu-latest
        steps:
            - name: Checkout Repository
              uses: actions/checkout@v4


            - name: Install Python
              uses: actions/setup-python@v5
              with:
                python-version: '3.11'


            - uses: iterative/setup-cml@v2


            - name: Install Dependencies
              run: |
                python -m pip install --upgrade pip
                pip install -r requirements.txt


            - name: Create CML Report/Graphs
              env:
                REPO_TOKEN: ${{ secrets.CML_TOKEN }} 
              run: |
                echo "# Model Evalutaion Results" >> report.md
                echo "## Bar Graph for Cross Val Scores" >> report.md
                echo "![](./plots/results.png)" >> report.md
                cml comment create report.md

This sets up CML, which we'll use for creating a markdown report with our model evaluation results. It includes:

A title for the report
A subtitle for the cross-validation scores graph
An embedded image of our results plot
The CML command to create a comment with this report

The REPO_TOKEN environment variable is set using a secret token, which allows CML to post comments to our repository.

This CI pipeline ensures that every time we push changes to our repository:

Our code is automatically checked out
The necessary environment is set up
Our model is re-trained and evaluated
A report with the latest results is generated and posted as a comment

This automation is crucial in MLOps as it allows us to continuously monitor our model's performance as we make changes to our code or data. It provides immediate feedback on how our changes affect model performance, enabling faster iteration and more robust model development.

10. Model Deployment with FastAPI

After training, tuning, and evaluating our model, the next crucial step in our MLOps pipeline is deploying the model to make it accessible for real-time predictions. For this project, we've chosen to use FastAPI, a modern, fast (high-performance) web framework for building APIs with Python.

We start by importing the necessary libraries and setting up our FastAPI application. It is based on flask or inspired by flask.
We then initialize our FastAPI app and mount a static folder for serving HTML content:

Python

app = FastAPI()
app.mount("/static", StaticFiles(directory="static"), name="static")

Defining API Endpoints

We define two main endpoints:

1. A home route that serves an HTML page:

Python

@app.get('/')
def home():
    return HTMLResponse(content=open('static/index.html').read(), status_code=200)

2. A predictions route that accepts POST requests with input data and returns predictions:

Python

@app.post('/predictions')
def do_predictions(test_data: PredictionDataset):
    try:
        data_dict = test_data.dict(by_alias=True)
        X_test = pd.DataFrame([data_dict])
        predictions = model_pipe.predict(X_test)
        predictions = predictions.tolist()
        return {"predicted_academic_success_score": predictions[0]}
    except Exception as e:
        logging.error(f"Prediction error: {e}")
        raise HTTPException(status_code=500, detail="Internal Server Error")

This endpoint uses our PredictionDataset Pydantic model to validate incoming data, processes it through our pipeline, and returns the prediction.

Running the Application

Finally, we set up the application to run using Uvicorn:

Python

if __name__ == "__main__":
    uvicorn.run(app="app:app", host="0.0.0.0", port=8000)

There is index.html an form that takes input from user and pass to transformer to transform in desired attributes and get prediction done by best model and a javascript function that post ‘/predictions’ and show prediction on same page.

Benefits of This Approach

Fast and Efficient: FastAPI is designed for high performance, making it suitable for production deployments.
Easy to Use: The framework provides intuitive decorators and type hints, making the code clean and easy to understand.
Automatic Documentation: FastAPI automatically generates OpenAPI (Swagger) documentation for our API.
Data Validation: By using Pydantic models, we ensure that incoming data is validated before processing.
Error Handling: We've implemented proper error handling to catch and log any issues during prediction.

This deployment setup allows us to serve our model predictions via a RESTful API, making it easy to integrate with various applications or services

11. Dockerization

In the final stages of our end-to-end MLOps project, we successfully integrated FastAPI into our machine learning pipeline to create a robust, scalable web application. This section delves into the Docker setup we used to containerize our FastAPI application, ensuring that it is both portable and easy to deploy.

1. Dockerfile Configuration

A key component of our deployment strategy was the creation of a Dockerfile, which defines the environment for our FastAPI application.

XML

# Use an official Python runtime as a parent image
FROM python:3.9-slim


# Set the working directory in the container
WORKDIR /app


# Copy specific files and directories into the container
COPY app.py /app/
COPY params.yaml /app/
COPY models/tuned_models /app/models/tuned_models
COPY models/transformers /app/models/transformers
COPY static/ /app/static/
COPY data_models.py /app/
COPY src/logger.py /app/src/
COPY src/models/models_list.py /app/src/models/


# Copy the requirements file
COPY requirements.txt /app/


# Install any needed packages specified in requirements.txt
RUN pip install --no-cache-dir -r requirements.txt


# Make port 8000 available to the world outside this container
EXPOSE 8000


# Set the entry point to run the app
CMD ["uvicorn", "app:app", "--host", "0.0.0.0", "--port", "8000"]

2. Building and Running the Docker Container

With the Dockerfile set up, we used the following commands to build and run our Docker image, these are run as stages of dvc :

XML

build_docker_image:
  cmd: |
    docker build -t academic-success-predictor .
  deps:
    - Dockerfile
    - requirements.txt


run_docker_container:
  cmd: |
    docker run --rm academic-success-predictor

We run the Docker container from the built image. The --rm flag ensures that the container is removed after it stops, keeping our environment clean.

output-min — Output after running docker Image

Key Benefits of Docker:

Consistent Development Environments
Streamlined Deployment Process
Improved Development Workflow
Portability Across Different Platforms
Efficient Continuous Integration and Continuous Deployment (CI/CD)
Better Collaboration and Sharing

Conclusion

This project illustrated the end-to-end MLOps process, from problem identification to model deployment and monitoring. Each stage of the pipeline, including data preprocessing, model training, version control, and deployment, was executed to create a robust and maintainable machine learning solution.

End-to-End MLOps Pipeline: A Comprehensive Project

gauravkumarchaurasiya

Improve

Article Tags :

Practice Tags :

Machine Learning

End-to-End MLOps Pipeline: A Comprehensive Project

Introduction to MLOps

Building an End-to-End MLOps Pipeline: A Practical Guide

1. Our Objectives Are to

2. Problem Statement

3. Description of the Dataset

4. Staring With Preprocessing the Data

5. Model Selection and Training

6. Hyperparameter Tuning

7. Model Evaluation

8. Visualization and Results Analysis

9. Continuous Integration with CML

10. Model Deployment with FastAPI

11. Dockerization

Conclusion

Similar Reads

Thank You!

What kind of Experience do you want to share?