Mdcm Sagar Assignment
Mdcm Sagar Assignment
Assignment-1
Submitted by:
Submitted by:
Nitin
40822018
B.Sc.(Data Analytics)
Ans.
Model Deployment : Deploying a machine learning model, also known as model deployment,
simply means integrating a machine learning model into an existing production
environment where it can take in an input and return an output.
- Model Deployment, on the other hand, focuses on integrating the trained model into a
production system where it can process live data and generate predictions. Deployment
considers factors like scalability, latency, and reliability.
- Model Deployment is about putting the model into a production environment where it will
handle real-world data. Unlike evaluation, which is done in a controlled environment,
deployment deals with unpredictable and dynamic data.
- Feature Engineering involves creating and selecting the most relevant features from raw data
to improve model performance during training.
- Model Deployment doesn’t usually involve creating or selecting features but rather ensuring
that the deployed system can handle incoming data in the same format and structure as during
training.
- Hyperparameter Tuning is the process of optimizing the parameters that are not learned from
the data but are set before training (like learning rate, number of layers, etc.).
- Model Deployment is the final step after tuning and training, where the model is put into
operation. It requires considerations like model versioning, monitoring, and possibly retraining
as new data becomes available.
- Data Preprocessing is the step where raw data is cleaned, transformed, and made ready for
model training.
- Model Deployment ensures that any preprocessing steps are consistently applied to new data
before the model makes predictions.
Q2. Briefly discuss Machine Learning System Architecture with a simple data case
study.
Ans.
Machine Learning System Architecture: typically consists of several key components that
work together to collect data, preprocess it, train a model, and deploy that model for making
predictions. Here's a brief overview of each component, followed by a simple case study:
1. Data Collection: Data is gathered from various sources such as databases, sensors, APIs, or
user inputs.
- To import dataset
2. Data Preprocessing: The collected data is cleaned, transformed, and normalized to prepare
it for model training.
3. Model Training: A machine learning model is trained using the preprocessed data to learn
patterns and make predictions.
4. Model Evaluation: The trained model is evaluated using metrics such as accuracy,
precision, recall, or RMSE to ensure it performs well on unseen data.
Ans.
Feature engineering : It is the process of transforming raw data into features that better
represent the underlying patterns to improve the performance of machine learning models.
1.Handling Missing Values: Missing data can be problematic for machine learning models.
Common techniques include imputation, where missing values are filled in, often using the
mean, median, or mode.
3. Feature Scaling: Feature scaling ensures that all features contribute equally to the distance
calculations used in many algorithms. Common methods include Min-Max Scaling and
Standardization (Z-score normalization).
4. Feature Selection: Feature selection involves choosing the most relevant features to include
in the model. This can be done using techniques like correlation, univariate selection, and
recursive feature elimination.
\
Q4. Discuss the step of Training and Prediction in ML pipeline. Give an example.
Ans.
Training and prediction are two critical steps in the machine learning (ML) pipeline. These
steps convert raw data into a model that can make predictions on new, unseen data.
1. Training
Training involves feeding the model with data and adjusting its parameters to minimize the
error between the model's predictions and the actual target values. This is where the model
learns the patterns and relationships in the data.
2. Prediction
Prediction involves using the trained model to make predictions on new, unseen data. The
model applies the learned parameters to the new input data to generate output predictions.
Objective: To use the trained model to make accurate predictions on new data.
Process:
o Input Data: Provide the new data (features) to the trained model.
o Apply Model: The model processes the input data using the learned parameters.
o Output Prediction: The model outputs the predicted values or classes for the input
data.
Q5. Define the Machine Learning Pipeline & name its different components.
Ans.
A Machine Learning Pipeline is a structured sequence of steps that automate the workflow
of machine learning tasks, from raw data input to the final output (predictions). The pipeline
ensures that data is processed and models are trained, evaluated, and deployed efficiently,
enabling consistent and repeatable workflows.
1. Data Collection:
Definition: The process of gathering raw data from various sources such as databases, APIs,
sensors, or user inputs.
Role in Pipeline: It serves as the initial step where the raw data required for training and
testing the model is acquired.
2. Data Preprocessing:
Definition: The process of cleaning, transforming, and preparing the raw data for analysis.
Role in Pipeline: This step involves handling missing values, encoding categorical variables,
scaling features, and creating new features, ensuring that the data is in a suitable format for
model training.
3. Feature Engineering:
Definition: The process of selecting, modifying, or creating new features from the raw data
that can improve model performance.
Role in Pipeline: This step enhances the predictive power of the machine learning model by
emphasizing important data patterns.
4. Model Selection:
Definition: The process of choosing the most appropriate machine learning algorithm for the
task at hand.
Role in Pipeline: In this step, different algorithms are tested and compared to identify the
one that delivers the best performance on the given data.
5. Model Training:
Definition: The process of feeding the prepared data into the selected model and allowing it
to learn the patterns and relationships within the data.
Role in Pipeline: This is the core step where the model’s parameters are optimized based on
the input data to minimize prediction errors.
6. Model Evaluation:
Definition: The process of assessing the performance of the trained model using various
metrics.
Role in Pipeline: This step involves testing the model on unseen data to evaluate its
accuracy, precision, recall, F1-score, etc., ensuring it generalizes well to new data.
8. Model Deployment:
Definition: The process of integrating the trained model into a production environment
where it can make predictions on new data.
Role in Pipeline: This step ensures that the model is available for real-time or batch
predictions, making it accessible for practical use.
Definition: The ongoing process of tracking the model’s performance and updating it as
needed to ensure it continues to perform well over time.
Role in Pipeline: This step involves monitoring the model in production, addressing issues
like data drift, and retraining the model as needed to maintain accuracy.
Q6. Write short notes on the following:
a) Data Ingestion
b) Data Preparation
c) Data Segregation
e) Model Development
Ans.
a) Data Ingestion: Data ingestion is the process of importing and transferring data from
various sources into a storage system or a data processing environment where it can be used
for analysis or modeling.
o Involves gathering data from multiple sources such as databases, files, APIs, or real-
time streams.
o Ensures that data is available in a central repository for further processing.
o Can be batch or real-time (streaming) ingestion.
c) Data Segregation: Data segregation is the process of dividing the dataset into separate
subsets, typically for the purposes of training, validating, and testing machine learning
models.
o Common splits include training set, validation set, and test set.
o Ensures that the model is trained on one portion of the data and tested on another to
assess its generalization ability.
o Helps in preventing overfitting by ensuring the model performs well on unseen data.
Training: The process where a machine learning model learns from the data by adjusting its
parameters based on the input features and target labels.
Testing: The evaluation of the trained model’s performance on a separate dataset (test set)
that was not used during training.
o Training involves fitting the model to the data, aiming to minimize errors on the
training set.
o Testing assesses how well the model generalizes to new, unseen data, providing an
estimate of real-world performance.
o A clear distinction between training and testing helps in avoiding overfitting.
Model Evaluation: The process of assessing the performance of a machine learning model
using various metrics to determine its accuracy, robustness, and generalization ability.
Model Deployment: The process of integrating a trained machine learning model into a
production environment where it can make predictions on new data.
o Model evaluation uses metrics such as accuracy, precision, recall, F1-score, and ROC-
AUC to measure performance.
o Model deployment involves making the model accessible for real-time predictions or
batch processing, often requiring considerations for scalability, latency, and
monitoring.
o Continuous monitoring post-deployment is essential to maintain the model’s
effectiveness over time.