AP Lecture33

The document outlines the lifecycle of a Machine Learning product, detailing stages from problem definition to model deployment and continuous monitoring. It emphasizes the importance of data collection, preparation, model development, and evaluation, highlighting techniques such as ETL, Content-Based and Collaborative Filtering for recommendations. Additionally, it discusses the roles of data scientists and AI engineers, the tools and libraries used in machine learning, and key concepts in AI engineering.

Uploaded by

shakshanbayt

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

2 views

AP Lecture33

Uploaded by

shakshanbayt

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 35

Advanced

programming
Lecture 3
lifecycle of a Machine Learning product
• First, define the problem or state the situation.
• data collection,
• data preparation,
• model development and evaluation,
• Finally to model deployment.

• In reality, the model lifecycle is iterative, which means

that we tend to go back and forth between these
processes.
ETL process
Data collection and preparation is called the Extract, Transform, and
Load, or ETL process.
The ETL process involves collecting data from various sources,
then cleaning, transforming, and storing it into a new single place.
The data is then accessible to the machine learning engineer, allowing
them to perform tasks like building a machine learning model.
Data collection
• User data (demographics, purchase history, etc. )
• Product data (inventory of products and what they do, their ingredients, how
popular they are, their customer ratings)
• Other data (user’s saved products, liked products, search history, most visited
products, and so on)

• To help increase business revenue, need to create and deploy a model

that recommends similar products to what the customer has already
bought.
• end-user’s pain point: “As a beauty product customer, I would like to
receive recommendations for other products based on my purchase
history so that I will be able to address my skincare needs and improve
the overall health of my skin.”
Data preparation
• Cleaning and structuring the data:
• Remove irrelevant/extreme values.
• Handle missing values appropriately.
• Ensure correct data formats. (For example, dates should be in
date formats and strings should be properly identified. )
• Feature Engineering: Creating new variables such as
average transaction time. (
• calculate the average duration between transactions for each
user and find which products they buy the most.
• need a feature that identifies what kind of skin issues each
product targets and assign them to each user.)
Data preparation
• Exploratory Data Analysis (EDA): Identify patterns and
correlations.
• create plots to visually identify patterns, validate the data
based on information that the beauty product subject matter
expert has given me, and do some correlation analysis to
identify what variables or features are very important to the
users’ buying habits and needs.
• Decide data splitting strategy (e.g., last transaction as
test set).
• Time-consuming process due to data
inconsistencies.
Model Development
• Choose the right model and framework.
• Use Content-Based Filtering to recommend similar products.
• if someone is using a cleanser with lots of water, it is likely that the user has dry skin and will
want a moisturizer that is highly moisturizing as well. One of the steps I might take here is to
create a similarity score of the products a user has purchased and rank them to other products. I
might recommend the most similar product while bearing in mind that there may be other factors
that could come into play. For example, I might notice that the user has searched for products
without particular ingredients, so I want to make sure that we are not recommending a product
that they absolutely won’t use.
• Use Collaborative Filtering to make recommendations based on user similarities.
• creating similarities between two users based on how they view a product. For example, I can
create a similarity, based on how two users rate their product. First, I group users into a bucket
based on their characteristics. This could be age, region, and skin type, products the users rated,
and or purchased. Then, I can take the average ratings for existing members and assume that the
new user will be somewhere around the average, and recommend a product based on what
others have rated highly.
• Combine both techniques for improved accuracy.
• Time-consuming due to model experimentation and optimization.
Model Evaluation
• Tune the model using test data.
• Conduct A/B testing with real users.
• Gather feedback (e.g., user ratings, click-through rates, purchase
rates).
• involve me tuning the model and doing some testing on the data set had
kept earlier for testing. Once satisfied with the results, will further
evaluate the model by experimenting with the recommendations on a
group of users and asking for their feedback. The feedback will include
asking the group of users to rate the recommendations, and collecting
data on the number of people who clicked and bought the recommended
products, along with any other necessary metrics.
• Modify the model based on evaluation results.
• Iterative process that can take significant time.
Model Deployment
• Deploy the model in a real-world environment (e.g.,
app, website).
• Ensure seamless integration with existing systems.
• Monitor model performance to detect issues early.
• Now that I am done with building and testing, the model is ready to go to
production. For this project, it will be a part of the beauty product app and
website.
• While this is the last step, I still need to track the deployed
model’s performance to make sure it continues to do the job that the
business requires. Future iterations may include retraining the model based
on new information in order to expand its capabilities.
Continuous Monitoring &
Improvement
• Track performance metrics post-deployment.
• Retrain the model periodically with new data.
• Adapt to changing user behavior and business needs.

• Each step in the ML lifecycle is essential for success.

• Data collection, preparation, and model evaluation are
the most time-consuming stages.
• Continuous monitoring is crucial for long-term
effectiveness.
Data Scientist and AI engineer
• traditionally, data scientists have always used AI models
to do their analysis.
• Generative AI breakthroughs have been so
groundbreaking that generative AI has split off into
its own distinct field, and we call that AI engineering.
data scientist as a data storyteller. AI engineer as an AI system builder
They take massive amounts of messy real use foundation models to build generative
world data, and they use mathematical AI systems that help to transform business
models to translate this data into insights. processes.
use a lot of descriptive analytics to Focus on prescriptive (decision
describe the past. Use descriptive & optimization, recommendation systems)
predictive analytics (EDA, clustering, and generative (LLMs, chatbots) use
regression, classification). cases.
Work mostly with structured (tabular) Work mainly with unstructured data
data. (text, images, audio, video).
Clean & preprocess datasets (e.g., remove
outliers, feature engineering).
Use a variety of models, each trained Require massive-scale datasets
for specific datasets. (billions-trillions of tokens for LLMs (large
language model).
Models are small, requiring less Use Foundation Models that generalize
computation and time (seconds to hours). across tasks. Models are huge, requiring
massive computational resources (weeks-
months to train).
a typical data science process AI Engineering Process
• start off with a use case, and then from • starts off with a use case, but then we
that use case, you pick the right data. can skip directly to working with a pre-
• Then after that data is prepared, you use trained model. And what makes this
it to train and validate a model using possible is a phenomenon called AI
techniques such as feature engineering, democratization, which is a big fancy
cross-validation, or hyperparameter word that simply means making AI more
tuning, as an example. This model then widely accessible to everyday users. AI
is deployed at some endpoint, for engineers interact with these foundation
example in the cloud, to do real-time models via natural language instructions
prediction and inference. to prompt them to do various tasks, and
this process is known as prompt
engineering.
Define use case Define use case
Collect & prepare structured data Use a pre-trained Foundation Model
Train & validate a specific ML model Apply Prompt Engineering, Fine-Tuning
Deploy model for real-time inference (PEFT), or RAG
Build end-to-end AI applications
Key Techniques in AI Engineering
• These are three major techniques used to adapt large pre-trained AI models
(such as GPT-4, LLaMA, or Claude) for specific tasks without training from
scratch.
Prompt Engineering
• The process of carefully crafting inputs (prompts) to guide the AI model’s
response.
• Bad Prompt: "Summarize this article.“
• Good Prompt: "Summarize this article in 3 bullet points, highlighting the key takeaways.“
Fine-Tuning (PEFT - Parameter-Efficient Fine-Tuning)
• Instead of training a whole model from scratch, PEFT adapts only a small
number of model parameters.
• A hospital wants an AI assistant fine-tuned on medical research papers to provide better
healthcare insights.
Retrieval-Augmented Generation (RAG)
• Enhances LLMs by retrieving relevant external documents before generating
a response.
• If an AI chatbot is helping lawyers, it can search legal databases before answering questions,
so its answers are always up to date and accurate.
What is data?
• Data is a collection of raw facts, figures, or information.
• Used to draw insights, inform decisions, and fuel AI and
machine learning.
• Essential for all machine learning models as it provides the
necessary information for pattern discovery and prediction.

• Machine learning tools help in:

• Data preprocessing
• Building, evaluating, and optimizing models
• Implementing machine learning solutions
• These tools simplify complex tasks like handling big data,
statistical analysis, and making predictions.
Popular Machine Learning
Libraries
• Pandas: Data manipulation and analysis
• Scikit-learn: Supervised and unsupervised learning
algorithms
• NumPy: Numerical computations for large datasets
• SciPy: Scientific computing, including optimization and
regression
Machine Learning Programming
Languages
• Python: Most widely used, extensive libraries for ML
and AI.
• R: Popular for statistical analysis and data exploration.
• Julia: High-performance, used in research and
numerical computing.
• Scala: Ideal for big data processing and ML pipelines.
• Java: Scalable ML applications for production.
• JavaScript: Runs ML models in web browsers for client-
side applications
Categories of Machine Learning
Tools
• Data Processing & Analytics
• Data Visualization
• Machine Learning Model Development
• Deep Learning Frameworks
• Computer Vision Tools
• Natural Language Processing (NLP) Tools
• Generative AI Tool
Data Processing & Analytics
Tools
• PostgreSQL: SQL-based database system.
• Hadoop: Scalable disk-based big data storage and
processing.
• Spark: In-memory data processing, faster than Hadoop.
• Apache Kafka: Real-time data streaming and analytics.
• Pandas: Data wrangling and transformation.
• NumPy: Mathematical functions and linear algebra
operations.
Data Visualization Tools
• Matplotlib: Customizable plots and visualizations.
• Seaborn: Statistical graphics built on Matplotlib.
• ggplot2: R-based visualization package for layered
graphics.
• Tableau: Business intelligence tool for interactive
dashboards
Machine Learning Model
Development Tools
• Scikit-learn: Classic ML algorithms (classification,
regression, clustering).
• Pandas: Prepares data for ML models.
• SciPy: Supports linear regression and optimization
Deep Learning Frameworks
• TensorFlow: Large-scale ML and deep learning.
• Keras: User-friendly deep learning library.
• Theano: Efficient mathematical computations for
neural networks.
• PyTorch: Deep learning with support for NLP and
computer vision
Computer Vision Tools

• OpenCV: Real-time image processing and object

detection.
• Scikit-Image: Image segmentation and feature
extraction.
• TorchVision: Pre-trained models and image
transformation functions
Natural Language Processing
(NLP) Tools
• NLTK: Text processing, tokenization, and stemming.
• TextBlob: Sentiment analysis and part-of-speech
tagging.
• Stanza: Pre-trained NLP models for multiple languages
Generative AI Tools
• Hugging Face Transformers: Transformer models for
NLP.
• ChatGPT: AI chatbot and text generation.
• DALL-E: AI-generated images from text descriptions.
• GANs (Generative Adversarial Networks): Deep
learning models for generating images and videos
Scikit-learn
• Scikit-learn is a free machine learning library for
Python.
• Provides algorithms for classification, regression,
clustering, and dimensionality reduction.
• Works seamlessly with NumPy, SciPy, and
Matplotlib.
• Offers extensive documentation and a large support
community
scales your data by standardizing it

Scikit-learn can split arrays and matrices into random

train and test subsets for you in one line of code.
Here, 33% of the data is reserved for testing

instantiate a classifier model using a support vector

classification algorithm. This line of code generates a
classification model object, called clf, and initializes
its parameters, gamma and C.
The clf model learns to predict the classes for
unknown cases by passing the training set to
the fit method.
Then you can use the test data to generate
predictions. The result tells you the
predicted class for each observation in the
test set.

You can also use different metrics to

evaluate your model accuracy, such as a
confusion matrix to compare the predicted
and actual labels for the test set. And
finally, you can save your model as a pickle
file and retrieve it
Confusion Matrix, Accuracy,
predict
• This is a table that shows how many predictions were
correct and incorrect.
• It compares the model's predictions (y_pred) with the
actual values (y_true).
• Accuracy measures how many predictions were
correct overall.
• The model predicts values for new data based on what
it learned from training
Which one of the following best describes
machine learning?
Which one of the following tasks is a machine learning
engineer more likely to perform than a data scientist?
Which library is at the core of an open-source Python machine learning ecosystem that enables you to develop machine learning models?

• Pandas
• Scikit-learch
Which library is a tool for data analysis, visualization, cleaning, and preparing data for machine learning?

• Numpy
• Scipy
Next slides
• Vanishing gradients → errors fade, network learns
poorly.
• Exploding gradients → errors become too large, network
behaves chaotically.
• Backpropagation → a method that helps the network
learn and correct errors.
• Batch normalization → stabilizes the network so that
learning is faster and more reliable.
• Data is the foundation of machine learning.
• Machine Learning Tools help simplify tasks and
enhance efficiency.
• Programming Languages like Python and R are
widely used.
• Specialized tools exist for data processing,
visualization, ML, deep learning, computer vision, NLP,
and generative AI.

Designing Machine Learning Systems by Chip Huygen by Rick
No ratings yet
Designing Machine Learning Systems by Chip Huygen by Rick
15 pages
The Effective Engineer PDF
No ratings yet
The Effective Engineer PDF
263 pages
Troubleshooting ConfigMgr
No ratings yet
Troubleshooting ConfigMgr
691 pages
Software Requirements Specification: For Foreign Trading System
No ratings yet
Software Requirements Specification: For Foreign Trading System
12 pages
Data Science and Machine Learning Project Ideas
100% (2)
Data Science and Machine Learning Project Ideas
20 pages
subtitle
No ratings yet
subtitle
2 pages
High Level Design Document: Online Grocery Recommendation Using Collaborative Filtering
No ratings yet
High Level Design Document: Online Grocery Recommendation Using Collaborative Filtering
18 pages
Class Xii Model Life Cycle
No ratings yet
Class Xii Model Life Cycle
6 pages
Session 4 Machine Learning Process (1)
No ratings yet
Session 4 Machine Learning Process (1)
28 pages
Machine Learning For Product Managers
No ratings yet
Machine Learning For Product Managers
7 pages
Exploring, Transforming, And Summarizing Input Datasets for Building Classification Models
No ratings yet
Exploring, Transforming, And Summarizing Input Datasets for Building Classification Models
21 pages
What Is Machine Learning
No ratings yet
What Is Machine Learning
22 pages
Mba 409B Set A
No ratings yet
Mba 409B Set A
21 pages
CUSTOMER SEGMENTATION 2
No ratings yet
CUSTOMER SEGMENTATION 2
19 pages
Advanced Skin Category Prediction System For Cosmetic Suggestion Using Deep Convolution Neural Network Report Final
No ratings yet
Advanced Skin Category Prediction System For Cosmetic Suggestion Using Deep Convolution Neural Network Report Final
52 pages
Aiml Online Brochure
No ratings yet
Aiml Online Brochure
20 pages
Accelerated DevOps with AI, ML & RPA: Non-Programmer’s Guide to AIOPS & MLOPS
From Everand
Accelerated DevOps with AI, ML & RPA: Non-Programmer’s Guide to AIOPS & MLOPS
Stephen Fleming
5/5 (2)
AI Engineer Interview Prep Guide
No ratings yet
AI Engineer Interview Prep Guide
16 pages
Applied Predictive Modeling: An Overview of Applied Predictive Modeling
From Everand
Applied Predictive Modeling: An Overview of Applied Predictive Modeling
Steven Taylor
No ratings yet
Model Based Environment: A Practical Guide for Data Model Implementation with Examples in Powerdesigner
From Everand
Model Based Environment: A Practical Guide for Data Model Implementation with Examples in Powerdesigner
Vladimir Pantic
No ratings yet
Artificial Intelligancy Architecture
No ratings yet
Artificial Intelligancy Architecture
13 pages
Building A Lean AI Startup
No ratings yet
Building A Lean AI Startup
36 pages
MCS-034: Software Engineering
From Everand
MCS-034: Software Engineering
Dr. DK Sukhani
No ratings yet
What The Heck You Should Know About Quality Engineering?
From Everand
What The Heck You Should Know About Quality Engineering?
Jibraka Jones
No ratings yet
A eye
No ratings yet
A eye
9 pages
Developing Machine Learning Solutions
No ratings yet
Developing Machine Learning Solutions
25 pages
How To Become A Product Manager For AI - ML Products
No ratings yet
How To Become A Product Manager For AI - ML Products
17 pages
IMDB Scraping & Analysis
No ratings yet
IMDB Scraping & Analysis
5 pages
Practical Data Strategies and Recipes
From Everand
Practical Data Strategies and Recipes
Tom Henricksen
No ratings yet
Data Task Breakdown
No ratings yet
Data Task Breakdown
12 pages
ML Interactively
No ratings yet
ML Interactively
273 pages
Java™ Programming: A Complete Project Lifecycle Guide
From Everand
Java™ Programming: A Complete Project Lifecycle Guide
Nitin Shreyakar
No ratings yet
"Big Data Science" Basic Concepts and Applications
From Everand
"Big Data Science" Basic Concepts and Applications
Sukanta Bhattacharya
No ratings yet
Model Life Cycle
No ratings yet
Model Life Cycle
25 pages
Feature Labs - ML 2.0
No ratings yet
Feature Labs - ML 2.0
13 pages
Sayiqa - AI Engineer
No ratings yet
Sayiqa - AI Engineer
4 pages
CLASS 10 AI Chapter 2
No ratings yet
CLASS 10 AI Chapter 2
18 pages
BDA BigDataArchitecturesAndModelManagement
No ratings yet
BDA BigDataArchitecturesAndModelManagement
48 pages
Deep Learning Nanodegree Syllabus: Project: Find Donors For Charityml
No ratings yet
Deep Learning Nanodegree Syllabus: Project: Find Donors For Charityml
13 pages
Consumer Behavior Analytics Using Machine Learning Algorithms
No ratings yet
Consumer Behavior Analytics Using Machine Learning Algorithms
3 pages
AI Learning
No ratings yet
AI Learning
19 pages
Machine Learning
No ratings yet
Machine Learning
7 pages
Introduction to AI and Machine Learning
No ratings yet
Introduction to AI and Machine Learning
21 pages
Data Fin
No ratings yet
Data Fin
16 pages
CHS Project Synopsis
No ratings yet
CHS Project Synopsis
13 pages
Practical Full Stack Machine Learning: A Guide to Build Reliable, Reusable, and Production-Ready Full Stack ML Solutions
From Everand
Practical Full Stack Machine Learning: A Guide to Build Reliable, Reusable, and Production-Ready Full Stack ML Solutions
Alok Kumar
No ratings yet
Mastering Data Mining Techniques
From Everand
Mastering Data Mining Techniques
Dhaanyalakshmi Ahuja
No ratings yet
Activity Log
No ratings yet
Activity Log
23 pages
Iv Year Technical Seminar Presentation
No ratings yet
Iv Year Technical Seminar Presentation
16 pages
IV Year Technical Seminar Presentation
No ratings yet
IV Year Technical Seminar Presentation
16 pages
Machine Learning (Aryan Kumar 7th Sem) PDF
No ratings yet
Machine Learning (Aryan Kumar 7th Sem) PDF
56 pages
ML
No ratings yet
ML
11 pages
Agile Software Development: Incremental-Based Work Benefits Developers and Customers
From Everand
Agile Software Development: Incremental-Based Work Benefits Developers and Customers
Anthony Baah
No ratings yet
DSF - UNIT III Notes
No ratings yet
DSF - UNIT III Notes
17 pages
ML Projects For Final Year
No ratings yet
ML Projects For Final Year
7 pages
PYTHON DATA SCIENCE: A Practical Guide to Mastering Python for Data Science and Artificial Intelligence (2023 Beginner Crash Course)
From Everand
PYTHON DATA SCIENCE: A Practical Guide to Mastering Python for Data Science and Artificial Intelligence (2023 Beginner Crash Course)
Calvert Long
No ratings yet
UNIT - 2 ML
No ratings yet
UNIT - 2 ML
8 pages
DECLARATION
No ratings yet
DECLARATION
30 pages
Artificial Intelligence - (Unit - 1)
No ratings yet
Artificial Intelligence - (Unit - 1)
47 pages
MSE-merged
No ratings yet
MSE-merged
78 pages
Ds unit 1 notes
No ratings yet
Ds unit 1 notes
23 pages
Mastering Machine Learning: A Comprehensive Guide to Success
From Everand
Mastering Machine Learning: A Comprehensive Guide to Success
Rick Spair
No ratings yet
NewITRAddOn
No ratings yet
NewITRAddOn
6 pages
Reference Guide - Challenge AZ 900 Fundamentals PDF
No ratings yet
Reference Guide - Challenge AZ 900 Fundamentals PDF
4 pages
Drop Box
No ratings yet
Drop Box
6 pages
Ey Erformance Ndicators: 4wire KPI Dashboard K P I
No ratings yet
Ey Erformance Ndicators: 4wire KPI Dashboard K P I
4 pages
Sap BPC
No ratings yet
Sap BPC
6 pages
JJTU PHD Guidelines 2014
100% (1)
JJTU PHD Guidelines 2014
40 pages
Arduino Leonardo/Micro As Game Controller/Joystick: Technology Workshop Craft Home Food Play Outside Costumes
No ratings yet
Arduino Leonardo/Micro As Game Controller/Joystick: Technology Workshop Craft Home Food Play Outside Costumes
7 pages
Download ebooks file Advanced R, Second Edition by Hadley Wickham all chapters
100% (7)
Download ebooks file Advanced R, Second Edition by Hadley Wickham all chapters
55 pages
Script Controllers Tokosepatu - PHP
No ratings yet
Script Controllers Tokosepatu - PHP
6 pages
Rakesh Pathak
No ratings yet
Rakesh Pathak
5 pages
Prompt Engineering 201 Advanced methods and toolkits - AI, software, tech, and people. Not in that order. By X
No ratings yet
Prompt Engineering 201 Advanced methods and toolkits - AI, software, tech, and people. Not in that order. By X
2 pages
Bca 2 Sem C Note 2023-24
No ratings yet
Bca 2 Sem C Note 2023-24
66 pages
Final Group Project For Cyber Security
No ratings yet
Final Group Project For Cyber Security
10 pages
HAT FLING SCRIPT - FEB 2023
No ratings yet
HAT FLING SCRIPT - FEB 2023
3 pages
CH Sunil's Resume
No ratings yet
CH Sunil's Resume
3 pages
FortiGate 7.4 Operator Course Description
100% (1)
FortiGate 7.4 Operator Course Description
2 pages
Papper 1 Completed 2023
No ratings yet
Papper 1 Completed 2023
12 pages
JSS3 e-Lesson Note - Computer Science
No ratings yet
JSS3 e-Lesson Note - Computer Science
13 pages
Gopal Sahastranaam Stotram Path Vidhi Labh Hindi 225
No ratings yet
Gopal Sahastranaam Stotram Path Vidhi Labh Hindi 225
22 pages
En Manual Powerlink
No ratings yet
En Manual Powerlink
59 pages
8th Semester
No ratings yet
8th Semester
13 pages
Machine Learning On Geographical Data Using Python
No ratings yet
Machine Learning On Geographical Data Using Python
309 pages
Michael Burch: Contact Experience
No ratings yet
Michael Burch: Contact Experience
1 page
Complete Download Security With Go 1st Edition John Daniel Leon PDF All Chapters
100% (3)
Complete Download Security With Go 1st Edition John Daniel Leon PDF All Chapters
62 pages
PHSS MobileApplication Estimation Featurelisting 15 Apr 2024
No ratings yet
PHSS MobileApplication Estimation Featurelisting 15 Apr 2024
28 pages
Xbox Game Pass Ultimate
No ratings yet
Xbox Game Pass Ultimate
866 pages
Assignment 3
100% (1)
Assignment 3
9 pages
Oromia Health Bureau: HRMIS Version 1.0, 2023
No ratings yet
Oromia Health Bureau: HRMIS Version 1.0, 2023
22 pages