0% found this document useful (0 votes)
5 views15 pages

Mdcm Sagar Assignment

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views15 pages

Mdcm Sagar Assignment

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 15

Model Development and Cloud Deployment

Assignment-1

Submitted by:

Mrs. Nidhi Grover Raheja

Submitted by:

Nitin

40822018

B.Sc.(Data Analytics)

Q1. Define Model Deployment. How is it different from other ML steps?

Ans.

Model Deployment : Deploying a machine learning model, also known as model deployment,
simply means integrating a machine learning model into an existing production
environment where it can take in an input and return an output.

Key differences from other ML steps :

1. Model Training vs. Deployment:

- Model Training involves selecting an appropriate algorithm, tuning hyperparameters, and


fitting the model to the training data. The goal is to optimize the model's performance based on
historical data.

- Model Deployment, on the other hand, focuses on integrating the trained model into a
production system where it can process live data and generate predictions. Deployment
considers factors like scalability, latency, and reliability.

2. Model Evaluation vs. Deployment:


- Model Evaluation involves assessing the model's performance using metrics like accuracy,
precision, recall, etc., on a validation or test dataset. This step ensures that the model
generalizes well to new data.

- Model Deployment is about putting the model into a production environment where it will
handle real-world data. Unlike evaluation, which is done in a controlled environment,
deployment deals with unpredictable and dynamic data.

3. Feature Engineering vs. Deployment:

- Feature Engineering involves creating and selecting the most relevant features from raw data
to improve model performance during training.

- Model Deployment doesn’t usually involve creating or selecting features but rather ensuring
that the deployed system can handle incoming data in the same format and structure as during
training.

4. Hyperparameter Tuning vs. Deployment:

- Hyperparameter Tuning is the process of optimizing the parameters that are not learned from
the data but are set before training (like learning rate, number of layers, etc.).

- Model Deployment is the final step after tuning and training, where the model is put into
operation. It requires considerations like model versioning, monitoring, and possibly retraining
as new data becomes available.

5. Data Preprocessing vs. Deployment:

- Data Preprocessing is the step where raw data is cleaned, transformed, and made ready for
model training.

- Model Deployment ensures that any preprocessing steps are consistently applied to new data
before the model makes predictions.
Q2. Briefly discuss Machine Learning System Architecture with a simple data case
study.

Ans.

Machine Learning System Architecture: typically consists of several key components that
work together to collect data, preprocess it, train a model, and deploy that model for making
predictions. Here's a brief overview of each component, followed by a simple case study:

1. Data Collection: Data is gathered from various sources such as databases, sensors, APIs, or
user inputs.

- To import dataset

2. Data Preprocessing: The collected data is cleaned, transformed, and normalized to prepare
it for model training.

3. Model Training: A machine learning model is trained using the preprocessed data to learn
patterns and make predictions.
4. Model Evaluation: The trained model is evaluated using metrics such as accuracy,
precision, recall, or RMSE to ensure it performs well on unseen data.

5. Mode Deployment: The evaluated model is deployed to a production environment where it


can make predictions in real-time or batch mode.
Q3. Explain different techniques of Feature engineering. Describe each technique
using Python as an example code. Take a sample dataset and necessary assumptions.

Ans.

Feature engineering : It is the process of transforming raw data into features that better
represent the underlying patterns to improve the performance of machine learning models.

Some common techniques of feature engineering are:

1.Handling Missing Values: Missing data can be problematic for machine learning models.
Common techniques include imputation, where missing values are filled in, often using the
mean, median, or mode.

2. Encoding Categorical Variables: Many machine learning algorithms require numerical


input, so categorical variables need to be converted to numeric form. Techniques include Label
Encoding and One-Hot Encoding.

3. Feature Scaling: Feature scaling ensures that all features contribute equally to the distance
calculations used in many algorithms. Common methods include Min-Max Scaling and
Standardization (Z-score normalization).
4. Feature Selection: Feature selection involves choosing the most relevant features to include
in the model. This can be done using techniques like correlation, univariate selection, and
recursive feature elimination.

\
Q4. Discuss the step of Training and Prediction in ML pipeline. Give an example.

Ans.

Training and prediction are two critical steps in the machine learning (ML) pipeline. These
steps convert raw data into a model that can make predictions on new, unseen data.

1. Training

Training involves feeding the model with data and adjusting its parameters to minimize the
error between the model's predictions and the actual target values. This is where the model
learns the patterns and relationships in the data.

2. Prediction

Prediction involves using the trained model to make predictions on new, unseen data. The
model applies the learned parameters to the new input data to generate output predictions.

Objective: To use the trained model to make accurate predictions on new data.

Process:

o Input Data: Provide the new data (features) to the trained model.
o Apply Model: The model processes the input data using the learned parameters.
o Output Prediction: The model outputs the predicted values or classes for the input
data.
Q5. Define the Machine Learning Pipeline & name its different components.

Ans.

Machine Learning Pipeline

A Machine Learning Pipeline is a structured sequence of steps that automate the workflow
of machine learning tasks, from raw data input to the final output (predictions). The pipeline
ensures that data is processed and models are trained, evaluated, and deployed efficiently,
enabling consistent and repeatable workflows.

Components of a Machine Learning Pipeline:

1. Data Collection:

Definition: The process of gathering raw data from various sources such as databases, APIs,
sensors, or user inputs.

Role in Pipeline: It serves as the initial step where the raw data required for training and
testing the model is acquired.

2. Data Preprocessing:

Definition: The process of cleaning, transforming, and preparing the raw data for analysis.

Role in Pipeline: This step involves handling missing values, encoding categorical variables,
scaling features, and creating new features, ensuring that the data is in a suitable format for
model training.
3. Feature Engineering:

Definition: The process of selecting, modifying, or creating new features from the raw data
that can improve model performance.

Role in Pipeline: This step enhances the predictive power of the machine learning model by
emphasizing important data patterns.

4. Model Selection:

Definition: The process of choosing the most appropriate machine learning algorithm for the
task at hand.

Role in Pipeline: In this step, different algorithms are tested and compared to identify the
one that delivers the best performance on the given data.
5. Model Training:

Definition: The process of feeding the prepared data into the selected model and allowing it
to learn the patterns and relationships within the data.

Role in Pipeline: This is the core step where the model’s parameters are optimized based on
the input data to minimize prediction errors.

6. Model Evaluation:

Definition: The process of assessing the performance of the trained model using various
metrics.

Role in Pipeline: This step involves testing the model on unseen data to evaluate its
accuracy, precision, recall, F1-score, etc., ensuring it generalizes well to new data.

7. Model Tuning (Hyperparameter Tuning):

Definition: The process of fine-tuning the model's hyperparameters to improve its


performance.
Role in Pipeline: In this step, various combinations of hyperparameters are tested to find the
optimal settings that lead to the best model performance.

8. Model Deployment:

Definition: The process of integrating the trained model into a production environment
where it can make predictions on new data.

Role in Pipeline: This step ensures that the model is available for real-time or batch
predictions, making it accessible for practical use.

9. Monitoring and Maintenance:

Definition: The ongoing process of tracking the model’s performance and updating it as
needed to ensure it continues to perform well over time.

Role in Pipeline: This step involves monitoring the model in production, addressing issues
like data drift, and retraining the model as needed to maintain accuracy.
Q6. Write short notes on the following:

a) Data Ingestion

b) Data Preparation

c) Data Segregation

d) Training vs. Testing

e) Model Development

f) Model Evaluation Deployment

Ans.

a) Data Ingestion: Data ingestion is the process of importing and transferring data from
various sources into a storage system or a data processing environment where it can be used
for analysis or modeling.

o Involves gathering data from multiple sources such as databases, files, APIs, or real-
time streams.
o Ensures that data is available in a central repository for further processing.
o Can be batch or real-time (streaming) ingestion.

b) Data Preparation: Data preparation is the process of cleaning, transforming, and


organizing raw data into a suitable format for analysis and modeling.

o Involves handling missing values, correcting errors, and standardizing formats.


o May include feature selection, feature engineering, and data normalization.
o A critical step that ensures the quality and accuracy of the data used in model training.

c) Data Segregation: Data segregation is the process of dividing the dataset into separate
subsets, typically for the purposes of training, validating, and testing machine learning
models.

o Common splits include training set, validation set, and test set.
o Ensures that the model is trained on one portion of the data and tested on another to
assess its generalization ability.
o Helps in preventing overfitting by ensuring the model performs well on unseen data.

d) Training vs. Testing

Training: The process where a machine learning model learns from the data by adjusting its
parameters based on the input features and target labels.

Testing: The evaluation of the trained model’s performance on a separate dataset (test set)
that was not used during training.

o Training involves fitting the model to the data, aiming to minimize errors on the
training set.
o Testing assesses how well the model generalizes to new, unseen data, providing an
estimate of real-world performance.
o A clear distinction between training and testing helps in avoiding overfitting.

e) Model Development: Model development is the process of creating a machine learning


model that can accurately learn from data and make predictions or decisions based on that
data.

o Involves selecting a suitable algorithm, training the model, and fine-tuning


hyperparameters.
o Includes iterative testing and validation to ensure the model performs well.
o The goal is to develop a model that is both accurate and generalizes well to new data.

f) Model Evaluation and Deployment

Model Evaluation: The process of assessing the performance of a machine learning model
using various metrics to determine its accuracy, robustness, and generalization ability.

Model Deployment: The process of integrating a trained machine learning model into a
production environment where it can make predictions on new data.
o Model evaluation uses metrics such as accuracy, precision, recall, F1-score, and ROC-
AUC to measure performance.
o Model deployment involves making the model accessible for real-time predictions or
batch processing, often requiring considerations for scalability, latency, and
monitoring.
o Continuous monitoring post-deployment is essential to maintain the model’s
effectiveness over time.

You might also like