Applied ML
Applied ML
Business Analytics
Lecture 5: Model Deployment
2
1. From Notebooks to Python
Scripts
3
Virtual environment
● Virtual Environment is required to isolate the packages necessary for
applications from our other projects that may have different dependencies
● requirements.txt
○ Set up the development environment
○ pip freeze will dump all dependencies of all our packages into the file
○ Try pipreqs, pip-tools
● setup.py
○ Redistribute the whole packages
○ Contains metadata, requirements and entry points
https://stackoverflow.com/questions/43658870/requirements-txt-vs-setup-py
4
Organized code
● Code should be readable, reproducible, scalable and efficient,
● Notebooks are only suitable for POC
● The code can be organized based on utility i.e., working pipeline components
5
Cookiecutter DS template
● One of templates we can use is:
○ https://drivendata.github.io/cookiecutter-data-science/
6
Cookiecutter DS template
pip install cookiecutter
cookiecutter https://github.com/drivendata/cookiecutter-data-science
cd cuisine_tag
Metadata
7
Cookiecutter template
● The structure frame will be
generated following the template
● Easier for us to understand and
modify the code base
8
Config
Config directory or file should be created for the following:
● Hyper-parameters for training
● Specifications for model locations, logging and other hand-coded information
● Running a small test for training
9
Config template
https://circleci.com/blog/what-is-yaml-a-beginner-s-guide/
10
Logging is important for ML Sys
● Life is short. You need logs
● Do not rely too much on print statements
○ For example, print(‘aaaaaa’)
● Logging is the process of tracking and recording key events that occur in the
applications
○ Inspect processes
○ Fix issues
○ More powerful than print statement
11
Logging 101
● Logger:
○ The main object that emits the log messages from the whole project
○ Can be specified to each module
● Handler:
○ Used for sending log records to a specific location and specifications for that location (name
size, etc)
○ Different handlers have different rules to save logs in local files
● Formatter
○ Used for style and layout of the log records
● Levels (according to different priorities)
○ CRITICAL > Error > WARNING > INFO > DEBUG
12
Levels in logs
13
Best practices in logging
● Logger in each module
○ Examples:
14
Best practices in logging
● Logger in each module
○ Easy to identify the error source
○ But at the same time: it is important to throw the pot
15
Best practices in logging
● Log all the details that you want to generate from the inside
○ It could be useful during development and model running check
● Should log messages outside of small functions and inside larger workflow
○ Logger could be placed within main.py and train.py since the smaller functions defined in other
scripts are used here
16
Logging configuration
● Coding directly in scripts
● Using a config file
○ logging.config.fileConfig()
Suitable for complex projects
● Using the dictionary type
○ logging.config.dictConfig()
○ Can be put in config/config.py
17
Documenting your code
● Document our code is a way to organize our code
● What is more, make others and ourselves in the future to easily use the code
base
● Most common documenting types:
○ Comments
○ Typing
○ Docstrings
○ Documentation
18
Comments
● Good code should not need comments because it is readable
● When do you need comments:
19
Typing
● Make our code as explicit as possible
○ Naming for variables and functions should be self-explaining
● Typing: Define the types for our function’s inputs and outputs
Starting from Python 3.9+, common types are built in
20
Docstrings
● Docstrings could be placed in functions and classes
● Use Python Docstrings Generator extension in VS Code
21
Documents
● The above are all placed inside scripts. The documentation is a separated doc.
● Some open-source packages could be used to automatically generate the
documentation
○ mkdocs (generates project documentation)
○ mkdocs-material (styling to beautiful render documentation)
○ mkdocstrings (fetch documentation automatically from docstrings)
22
Styling
● Code is read more often than it is written
● Follow consistent style and formatting conventions -> make code easy to read
● Most conventions are based on PEP8 conventions.
● We have lots of pipeline tools in place to automatically and effortlessly ensure
that consistency
23
Styling tools
● Those tools could be used with configurable options:
○ Black: an in-place reformatter that (mostly) adheres to PEP8.
○ isort: sorts and formats import statements inside Python scripts.
○ flake8: a code linter with stylistic conventions that adhere to PEP8.
24
Formatting done by Black
25
Makefile
● Makefile is an automation tool that organizes our commands
● Syntax:
26
Makefile
● Different rules can be configured in Makefile
○ Example here
27
2 Interfaces of ML Systems
28
How to deploy ML models
● Batch Deployment
○ Generate Predictions at defined frequencies
● Real-time Deployment
○ Generate predictions as requests arrive
They are also called as online prediction
● Streaming Deployment
○ Generate predictions when specific events trigger
● Edge Deployment
○ Generate predictions on users’ side
29
Batch deployment
● Frequency: Periodical
● Processing accumulated data when you do not need immediate results
○ Predictions can be pre-computed and stored in a database. Then, can be easily retrieved when
needed
○ However, predictions can be quickly outdated if we can not use recent data.
● Applications:
○ TripAdvisor hotel ranking
○ Netflix recommendation
30
Real-time deployment
● Frequency: as soon as requests come
○ A synchronous process when a user/customer requests a prediction
● The process starts with users’ requests
○ Users’ requests is pushed to a backend service (usually through HTTP API calls)
○ Then, it is pushed it to a ML service
○ ML service would either take features from the request or collect recent contextual information
to return predictions
● Multi-threaded processes and vertical scaling by additional servers could
handle latency and concurrency issues.
○ Multiple users raise additional parallel requests requests
● Applications:
○ Google translation
31
Streaming deployment
● Frequency: based on events
○ A more synchronous process compared to real-time deployment
● Events can trigger the start of prediction process
○ Users’ requests is pushed to a backend service (usually through HTTP API calls)
○ For example, you are at tiktok page, the recommendation process would be triggered. And by
the time your scroll, the recommendation results will be ready to be refreshed
○ Massage brokers like Kafka are always used as the queueing process
● Applications:
○ Facebooks Ads
○ Tiktok recommendation
Source: https://www.tekhnoal.com/streaming-ml-model-deployment.html
32
Edge deployment
● Model is directly deployed on the client side
○ Web browser, Mobile phone, Car, IoT hardwares
○ Can be fastest and offline predictions (without internet)
○ Models’ complexity are limited due to the smaller hardware
Source: https://www.kdnuggets.com/2018/09/deep-learning-edge.html 33
Batch vs Online deployment
Batch deployment Real-time deployment
● Pro: ● Pro:
○ The most simple deployment approach ○ The model takes in account near real-time
● Cons: data and make fresh predictions
○ It is not efficient since most predictions ● Cons:
might not be used at the end ○ Has some steep learning curve
○ It can not react to data changes
34
Hybrid: batch & real-time prediction
● Real-time prediction is default, but common queries are precomputed and stored
● Food delivery services
○ Restaurant recommendations use batch predictions
○ Within each restaurant, item recommendations use online predictions
● Streaming services
○ Title recommendations use batch predictions
○ Row orders use online predictions
35
Batch deployment
● A batch deployment usually work as on a fixed schedule (every 9:30 am), raw data
are processed, and then model predictions are generated
● 3 pipeline architecture is usually used:
○ Feature pipeline
○ Training pipeline
○ Batch prediction pipeline
36
Batch deployment
Feature Store
Fe
Ta a
tu res Fe rg tu
at
Fea ur ets
re
s
s
get es
Tar
Models
s
el
od
M
Model
Registry 37
Batch deployment: feature engineering
● Read raw data and generates features and labels
● Two engineering change would be applied:
○ Automation: feature pipeline to be executed in a fixed interval
■ Cron job
■ Airflow
■ GitHub action
○ Persistence: a place to store features generated by the script (instead of csv files on disk).
■ Feast
■ Other feature store tools
Source:
https://towardsdatascience.com/do-you-really-need-a-feature
-store-e59e3cc666d3
38
Batch deployment: model training
● Read raw data and generates features and labels
● Turn models into binary formats
○ scikit-learn, XGBoost -> joblib, pickle
○ TensorFlow -> .save()
○ PyTorch -> .save()
○ We can save the trained model in the model registry (such as mlflow)
39
Batch deployment: batch inference
● Create a new script to do the following things:
○ Loads the production model from the model registry
○ Loading the most recent feature batch
○ Make model predictions and save them in databases
● The above script should also be scheduled
40
Deploy ML model in RestAPI
● Wrap ML Models in a Rest API
Model Preparation
● Deploy them as a Microservice 1. Data collection &
clean
2. Feature & model
selection
3. Model training
Web Service
Serialization &
De-serialization
1. Save model from
memory to disk
2. Load model from
disk to memory
41
Model as a web endpoint
● A model as an endpoint:
○ Prediction in response of a set of inputs
○ Here, inputs are feature vectors, images or model inputs
○ Other systems can easily use the predictive model which provides a real-time result
42
Python web frameworks
● Flask
○ Suitable for quickly prototype
● Django
○ First choice to build robust full-stack websites
● FastAPI
○ Good at speed or scalability but quite new
A proper deployment also need a WSGI server that provides scaling, routing and load
balancing.
43
Build a web app using Flask
● Flask: a lightweight web framework for Python
○ Create an API call which can be used from front-end
○ Build a full-on web application
Web Server
Code snippet 44
Build a spam detection web app
● Spam detection from notebook needs to be deployed in order to be used by
our end-users
● Project Workflow
Web Page
Jupyter Notebook
Send the
predicted
Send query to Flask Server
Save the model to disk labels to the
Sent Predictions webpage
46
Frontend design
● Created index.html for web page design
○ Collect text from users
○ Display predictions whether it is spam or ham.
Web UI
47
Code Snippet
Create app.py
● Create app.py under the main folder
○ Connect backend to frontend
○ Send the responses to the UI after predicting the label
Code Snippet
49
Severless deployments
● Reduces the DevOps overhead of deploying models as web services
○ We have to take care of provisioning and server maintenance
○ Worry about scale. Would one server be enough?
○ Reduce the efforts and deployment time when the team size is small
● GCP Cloud Functions or AWS Lambda
● With serverless function environments,
○ Write a function that the runtime supports
○ Specify a list of dependencies
○ Deploy the function to production
○ The rest is fully managed by cloud platform such as provisioning servers, scaling up more
machines to match demand, managing load balancers, and handling versioning.
51
MLOps = ML + DevOps
● MLOps:
○ A sequence of steps implemented to deploy an ML Model to the production environment
○ It is easy to create ML models that can predict based on the data you fed
○ It is challenging to create such models are are reliable, fast, accurate, and can be used by a
large number of users
Image Credits 52
MLOps concepts: I
● Development Platform
○ Enable smooth handover from ML Training to deployment
○ A collaboration platform for performing ML experiments
○ Enable secure access to data sources
● Versioning
○ Track the version of data and code
● Model Registry
○ An overview of deployed & legacy ML Models and their version history, and the deployment
stage of each version
● Model Governance
○ Access control to training process related to any given models
○ Access control for who can request/reject/approve transitions between deployment stages (
dev to staging to prod) in the model registry
53
MLOps concepts: II
● Monitoring
○ Track performance metrics
■ ML metrics: F1 score, MSE, …
■ Ops metrics: uptime, throughput, response time
○ Drift detection
■ Concept drift: when the relation between input and output has changed
■ Label drift: changes in predictions, but the model still holds
■ Feature drift: change in the model’s outcomes compared to training data
■ Prediction drift: change in the distribution of model input data
○ Outlier detection
■ If the new input is totally different from any training samples, we can identify this sample
as potential outlier and the risk on the trustworthy of the model’s prediction
54
MLOps concepts: III
● Model Unit Testing: when we create, change or retrain a model, we should
automatically validate the integrity of the model
○ Should meet minimum ml performance metrics on a test set
○ Should perform well on synthetic use case-specific datastest
● Devops Concepts:
○ CI/CD
○ Unit Test
○ Code Structure
○ Documentation
55
Manual MLOPs
All the work are done manually
https://towardsdatascience.com/a-gentle-introduction-to-mlops-7d64a3e890ff 56
MLOPs
Automated Pipeline
https://cloud.google.com/architecture/mlops-continuous-deliver
y-and-automation-pipelines-in-machine-learning
57
4. Building ML Pipelines with
better tools
58
ML pipeline for spam detection
● Tools used for the ML pipeline
○ Flask: create API as interfaces of models
○ MLFlow: for model registry
○ Github: for code version control
○ Data Version Control (DVC): version control of the datasets and to make pipeline
○ Cookiecutter: Project templates
59
Create virtual environment
60
Create project structure using the cookiecutter
61
Create a github repo
64
Prepare source code inside the src folder
● Add data loading related scripts into the folder of data
● Add modeling related scripts into the folder of models
65
Data older
67
Pipeline creation with DVC
● With all scripts in src folder, create the dvc.yaml to define the pipeline
● Each stage in yaml files contains:
○ cmd: bash command to execute the script
○ deps: the dependencies to execute the step
○ outs: output from the cmd line (model or data)
○ params: parameters used in the script
● With deps, we can create DAG
○ Call “dvc dag”
68
Pipeline creation with DVC
69
Code Snippet: dvc.yaml
Execute the pipeline
● Use two terminals to execute:
mlflow server --backend-store-uri sqlite:///mlflow.db --default-artifact-root ./artifacts --host 0.0.0.0 -p 1234
dvc repro
70
Why DVC
● DVC only conduct the actions if dependencies and parameters are changed
● For example, run dvc repro again
71
Build ml pipeline using DVC and MLflow
● Check the full implementation with ML pipeline in our github page
72
What we are missing
● Unit/Load tests
● Deploy the application in a real environment (not local env.)
● CI/CD
○ Push the change to git repo
○ It can be immediately deployed in production after passing the test
○ The answers from industries at this moment are:
■ Containers
■ Kubernetes
● Model Monitoring
73
Next Class: Explainable Machine Learning
74