0% found this document useful (0 votes)
1 views

Updated Syllabus ML Ver 1

Unit 1 covers the fundamentals of machine learning applications, focusing on regression techniques, error metrics, and model evaluation. Key concepts include function derivation from data, the bias-variance tradeoff, and the use of scikit-learn for implementing machine learning models. The document also discusses handling missing data through imputation and the importance of training and validation in ensuring model generalizability.

Uploaded by

yogesh.kg4789
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
1 views

Updated Syllabus ML Ver 1

Unit 1 covers the fundamentals of machine learning applications, focusing on regression techniques, error metrics, and model evaluation. Key concepts include function derivation from data, the bias-variance tradeoff, and the use of scikit-learn for implementing machine learning models. The document also discusses handling missing data through imputation and the importance of training and validation in ensuring model generalizability.

Uploaded by

yogesh.kg4789
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 21

Unit 1:

Introduction: Machine Learning Applications.


Regression: Function derivation from Data, Error Metrics – MSE, RMSE, MAE, Bias Variance
tradeoff, Training and Validation, Introduction to scikit-learn, Imputation, One Hot encoding,
correlation.

Regression involves finding a function that best represents the relationship between input
variables and a continuous target variable, using data as the foundation for the derivation. Error
metrics like MSE, RMSE, and MAE are used to evaluate the quality of the function by
measuring the difference between predictions and actual values. The bias-variance tradeoff
guides model selection, aiming for a balance that prevents both underfitting and
overfitting. Training and validation splits ensure model generalizability on unseen data, and
scikit-learn provides tools for model building and evaluation. Missing data can be handled
through imputation, while categorical features can be encoded using one-hot
encoding. Correlation analysis helps understand the relationships between variables before
model training.
1. Function Derivation from Data:
 Regression models: predict a numerical value based on input features.
 Data is used to "fit" the model, meaning the model learns a function that best explains
the relationship between inputs and outputs.
 Linear regression: is a common example where the relationship is assumed to be linear.
 Non-linear models: (e.g., polynomial regression) can be used when the relationship is
more complex.
 Machine learning algorithms: like gradient descent are often used to find the "best fit"
function by minimizing a loss function.
2. Error Metrics:
 Mean Squared Error (MSE):
Calculates the average of the squared differences between predicted and actual values. It
penalizes larger errors more heavily.
 Root Mean Squared Error (RMSE):
The square root of the MSE, expressed in the same units as the target variable. It's a more
interpretable metric than MSE.
 Mean Absolute Error (MAE):
The average of the absolute differences between predicted and actual values. It is less sensitive to
outliers compared to MSE and RMSE.
3. Bias-Variance Tradeoff:
 Bias:
The error introduced by approximating a real-world relationship with a simplified model. High
bias can lead to underfitting (model is too simple and doesn't capture the underlying patterns).
 Variance:
The sensitivity of the model to the specific training data. High variance can lead to overfitting
(model learns the training data too well, including noise, and doesn't generalize well to new
data).
 The tradeoff:
There's a balance to find between bias and variance to achieve optimal performance.
 Techniques to manage the tradeoff:
Regularization (penalizing complex models), cross-validation (evaluating on different subsets of
the data), and ensemble methods (combining multiple models).
4. Training and Validation:
 Training set:
Used to train the model and learn the relationship between inputs and outputs.
 Validation/Test set:
Used to evaluate the model's performance on unseen data and assess its generalizability.
 Splitting data:
The dataset is typically split into training and validation/test sets before model training.
 Scikit-learn:
Scikit-learn provides tools to perform this split (e.g., train_test_split).
5. Introduction to Scikit-learn:
 Scikit-learn (sklearn): A powerful machine learning library in Python.
 Provides: Algorithms for regression, classification, clustering, and other tasks.
 Easy-to-use: Offers a consistent API and makes it easier to implement and evaluate
models.
 Integration with other libraries: Works well with NumPy and Pandas, popular Python
libraries for numerical computation and data analysis.
6. Imputation:
 Missing data: Occurs when some values in the dataset are missing.
 Imputation: The process of replacing missing values with reasonable estimates.
 Methods: Mean imputation, median imputation, k-nearest neighbor imputation, etc.
Links :
https://proclusacademy.com/blog/explainer/regression-metrics-you-must-know/
Error Metrics

Regression metrics serve as quantitative measures to assess the performance of regression


models by evaluating the disparity between predicted and actual values.
Let’s explore some of the most commonly used regression metrics:
1. Mean Squared Error (MSE)
MSE calculates the average squared difference between predicted and actual values.

where yi​ represents the actual value, y^​ i​ represents the predicted value, and n is the
number of observations.
MSE measures the average squared error, with higher values indicating more significant
discrepancies between predicted and actual values.
MSE penalizes more significant errors due to squaring, making it sensitive to outliers. It is
commonly used due to its mathematical properties but may be less interpretable than other
metrics.
2. Root Mean Squared Error (RMSE)
RMSE is the square root of the MSE and measures the average magnitude of errors.
RMSE=square_root(MSE​ )
RMSE shares a similar interpretation to MSE but is in the same units as the dependent variable,
making it more interpretable.

RMSE is preferred when the distribution of errors is not normal or when outliers are present, as it
mitigates the impact of large errors.
3. Mean Absolute Error (MAE)
MAE computes the average absolute difference between predicted and actual values.

It measures the average magnitude of errors, with higher values indicating larger discrepancies
between predicted and actual values.
MAE is less sensitive to outliers than MSE but may not adequately penalize large errors.
4. R-squared (R²)
R² measures the proportion of variance in the dependent variable explained by the independent
variables.

where SSR is the sum of squared residuals, and SST is the total sum of squares.
R² ranges from 0 to 1, with higher values indicating a better fit of the model to the data. However,
it does not provide information about the goodness of individual predictions.
R² may artificially increase with more independent variables, and a high R² does not necessarily
imply a good model fit.
Function Derivation for Data
Deriving a function from data involves approximating a function that best represents a set of data
points. This process can be done using various methods, including interpolation, numerical
differentiation, and curve fitting.
1. Understanding the Concept:
 Data points:
You have a set of (x, y) pairs where x represents the input and y represents the corresponding
output.
 Function derivation:
The goal is to find a function f(x) that best approximates or fits these data points.
 Derivatives:

A derivative, f'(x), represents the instantaneous rate of change of the function at a specific point.
2. Methods for Deriving a Function:
 Interpolation:
Find a function that passes exactly through all the data points. Common interpolation methods
include:
 Linear interpolation: Connecting data points with straight lines.
 Spline interpolation: Using smooth curves (splines) to connect data points.
 Numerical Differentiation:
 Finite difference methods: Approximate the derivative by calculating the slope
between points.
 Example: Using the formula f'(x) ≈ (f(x + h) - f(x)) / h, where h is a small value.
 Curve Fitting:
Find a function that best fits the data points based on a chosen function type (e.g., linear,
polynomial, exponential).
3. Steps Involved in Function Derivation:
1. Data Preparation:

o Clean and organize the data.

o Identify any outliers or errors.


2. Choose a Method:
o Select an appropriate method based on the nature of the data and the desired
outcome.
3. Apply the Method:
o Use the chosen method to derive a function or estimate the derivative.
4. Evaluate the Results:
o Assess the accuracy of the derived function or derivative.

o Consider how well the derived function fits the data points.
5. Refinement (if needed):
o Adjust parameters or try different methods if the initial results are not
satisfactory.
4. Example:
Imagine you have data points (x, y) for a moving object's position (y) at different times (x). You
can use numerical differentiation to approximate the object's velocity (which is the derivative of
position with respect to time) at each time point.
5. Applications:
 Data Science: Modeling trends and patterns in data.
 Physics: Calculating acceleration from position and time data.
 Engineering: Analyzing sensor data and creating models.
So, deriving a function from data involves finding a mathematical representation that best
describes the relationship between input and output values, often using techniques like
interpolation, numerical differentiation, or curve fitting. This process allows us to analyze data,
make predictions, and understand underlying trends and relationships.
Reference Link: https://imswapnilb.medium.com/derivatives-in-data-science-c5d7bd916f17
Evaluating your Machine Learning Model
The primary aim of the Machine Learning models is to learn from the given data and generate
predictions based on the pattern observed during the learning process. However, our task doesn’t
end there. We need to continuously make improvements to the models, based on the kind of
results it generates. We also quantify the model’s performance using metrics like Accuracy, Mean
Squared Error(MSE), F1-Score, etc and try to improve these metrics. This can often get tricky
when we have to maintain the flexibility of the model without compromising on its correctness.

A supervised Machine Learning model aims to train itself on the input variables(X) in such a
way that the predicted values(Y) are as close to the actual values as possible (Modafinil). This
difference between the actual values and predicted values is the error and it is used to evaluate
the model. The error for any supervised Machine Learning algorithm comprises of 3 parts:
1. Bias error
2. Variance error
3. The noise
While the noise is the irreducible error that we cannot eliminate, the other two i.e. Bias and
Variance are reducible errors that we can attempt to minimize as much as possible.
In the following sections, we will cover the Bias error, Variance error, and the Bias-Variance
tradeoff which will aid us in the best model selection. And what’s exciting is that we will cover
some techniques to deal with these errors by using an example dataset.
Problem Statement and Primary Steps
As explained earlier, we have taken up the Pima Indians Diabetes dataset and formed a
classification problem on it. Let’s start by gauging the dataset and observe the kind of data we
are dealing with. We will do this by importing the necessary libraries:
import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)

from sklearn.model_selection import train_test_split


from sklearn.preprocessing import StandardScaler
from sklearn.neighbors import KNeighborsClassifier
from sklearn.neighbors import KNeighborsRegressor
from sklearn.metrics import confusion_matrix
from sklearn import metrics
import matplotlib.pyplot as plt
%matplotlib inlineCopy Code
Now, we will load the data into a data frame and observe some rows to get insights into the data.
import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.neighbors import KNeighborsClassifier
from sklearn.neighbors import KNeighborsRegressor
from sklearn.metrics import confusion_matrix
from sklearn import metrics
import matplotlib.pyplot as plt #%matplotlib inline
data_file_path = 'diabetes.csv'
data_df = pd.read_csv(data_file_path)
print(data_df.head())Copy Code

We need to predict the ‘Outcome’ column. Let us separate it and assign it to a target variable ‘y’.
The rest of the data frame will be the set of input variables X.
y = data_df["Outcome"].values
x = data_df.drop(["Outcome"],axis=1)Copy Code
Now let’s scale the predictor variables and then separate the training and the testing data.
from sklearn.preprocessing import StandardScaler
ss = StandardScaler()
data_df = ss.fit_transform(data_df)
#Divide into training and test data
X_train, X_test, y_train, y_test = train_test_split(x, y, test_size = 0.3) # 70% training and 30%
testCopy Code
Since the outcomes are classified in a binary form, we will use the simplest K-nearest neighbor
classifier(Knn) to classify whether the patient has diabetes or not.
However, how do we decide the value of ‘k’?
 Maybe we should use k = 1 so that we will get very good results on our training data?
That might work, but we cannot guarantee that the model will perform just as well on our
testing data since it can get too specific
 How about using a high value of k, say like k = 100 so that we can consider a large
number of nearest points to account for the distant points as well? However, this kind of
model will be too generic and we cannot be sure if it has considered all the possible
contributing features correctly.
Let us take a few possible values of k and fit the model on the training data for all those
values. We will also compute the training score and testing score for all those values.
train_score = []
test_score = []
k_vals = []
for k in range(1, 21):
k_vals.append(k)
knn = KNeighborsClassifier(n_neighbors = k)
knn.fit(X_train, y_train)
tr_score = knn.score(X_train, y_train)
train_score.append(tr_score)
te_score = knn.score(X_test, y_test)
test_score.append(te_score)Copy Code
To derive more insights from this, let us plot the training data(in red) and the testing data(in blue).
plt.figure(figsize=(10,5))
plt.xlabel('Different Values of K')
plt.ylabel('Model score')
plt.plot(k_vals, train_score, color = 'r', label = "training score")
plt.plot(k_vals, test_score, color = 'b', label = 'test score')
plt.legend(bbox_to_anchor=(1, 1),
bbox_transform=plt.gcf().transFigure)
plt.show()Copy Code
To calculate the scores for a particular value of k,
knn = KNeighborsClassifier(n_neighbors = 14)
#Fit the model
knn.fit(X_train,y_train)
#get the score
knn.score(X_test,y_test)Copy Code
We can make the following conclusions from the above plot:
 For low values of k, the training score is high, while the testing score is low

 As the value of k increases, the testing score starts to increase and the training score starts
to decrease.
 However, at some value of k, both the training score and the testing score are close to
each other.
This is where Bias and Variance come into the picture.
What is Bias?
In the simplest terms, Bias is the difference between the Predicted Value and the Expected Value.
To explain further, the model makes certain assumptions when it trains on the data provided.
When it is introduced to the testing/validation data, these assumptions may not always be correct.
In our model, if we use a large number of nearest neighbors, the model can totally decide that
some parameters are not important at all. For example, it can just consider that the Glusoce level
and the Blood Pressure decide if the patient has diabetes. This model would make very strong
assumptions about the other parameters not affecting the outcome. You can also think of it as a
model predicting a simple relationship when the datapoints clearly indicate a more complex
relationship:
Mathematically, let the input variables be X and a target variable Y. We map the relationship
between the two using a function f.
Therefore,
Y = f(X) + e
Here ‘e’ is the error that is normally distributed. The aim of our model f'(x) is to predict values as
close to f(x) as possible. Here, the Bias of the model is:
Bias[f'(X)] = E[f'(X) – f(X)]
As I explained above, when the model makes the generalizations i.e. when there is a high bias
error, it results in a very simplistic model that does not consider the variations very well. Since it
does not learn the training data very well, it is called Underfitting.
What is a Variance?
Contrary to bias, the Variance is when the model takes into account the fluctuations in the data i.e.
the noise as well. So, what happens when our model has a high variance?
The model will still consider the variance as something to learn from. That is, the model learns
too much from the training data, so much so, that when confronted with new (testing) data, it is
unable to predict accurately based on it.
Mathematically, the variance error in the model is:
Variance[f(x))=E[X^2]−E[X]^2
Since in the case of high variance, the model learns too much from the training data, it is
called overfitting.
In the context of our data, if we use very few nearest neighbors, it is like saying that if the
number of pregnancies is more than 3, the glucose level is more than 78, Diastolic BP is less than
98, Skin thickness is less than 23 mm and so on for every feature. decide that the patient has
diabetes. All the other patients who don’t meet the above criteria are not diabetic. While this may
be true for one particular patient in the training set, what if these parameters are the outliers or
were even recorded incorrectly? Clearly, such a model could prove to be very costly!
Additionally, this model would have a high variance error because the predictions of the patient
being diabetic or not vary greatly with the kind of training data we are providing it. So even
changing the Glucose Level to 75 would result in the model predicting that the patient does not
have diabetes.
To make it simpler, the model predicts very complex relationships between the outcome and the
input features when a quadratic equation would have sufficed. This is how a classification model
would look like when there is a high variance error/when there is overfitting:
To summarise,
 A model with a high bias error underfits data and makes very simplistic assumptions on it
 A model with a high variance error overfits the data and learns too much from it
 A good model is where both Bias and Variance errors are balanced
Bias-Variance Tradeoff
The bias-variance tradeoff is a fundamental concept in machine learning and statistics. It refers to
the delicate balance between two sources of error in a predictive model: bias and variance.
Bias represents the error due to overly simplistic assumptions in the learning algorithm. High
bias can cause the model to underfit the data, leading to poor performance on both training and
unseen data.
Variance, on the other hand, reflects the model’s sensitivity to small fluctuations in the training
data. High variance can lead to overfitting, where the model captures noise in the training data
and performs poorly on new, unseen data.
The goal is to find the right level of complexity in a model to minimize both bias and variance,
achieving good generalization to new data. Balancing these factors is essential for building
models that perform well on a variety of datasets.
Understand Bias-Variance Tradeoff with the help of an example
How do we relate the above concepts to our Knn model from earlier?
In our model, say, for, k = 1, the point closest to the datapoint in question will be considered.
Here, the prediction might be accurate for that particular data point so the bias error will be less.
However, the variance error will be high since only the one nearest point is considered and this
doesn’t take into account the other possible points. What scenario do you think this corresponds
to? Yes, you are thinking right, this means that our model is overfitting.
On the other hand, for higher values of k, many more points closer to the datapoint in question
will be considered. This would result in higher bias error and underfitting since many points
closer to the datapoint are considered and thus it can’t learn the specifics from the training set.
However, we can account for a lower variance error for the testing set which has unknown values.
To achieve a balance between the Bias error and the Variance error, we need a value of k such
that the model neither learns from the noise (overfit on data) nor makes sweeping assumptions
on the data(underfit on data). Though some points are classified incorrectly, the model generally
fits most of the datapoints accurately. The balance between the Bias error and the Variance error
is the Bias-Variance Tradeoff.
OHE https://scikit-
learn.org/stable/modules/generated/sklearn.preprocessing.OneHotEncoder.html
https://scikit-learn.org/stable/modules/preprocessing.html
https://machinelearningmastery.com/one-hot-encoding-for-categorical-data/

UNIT 2 STUDY FRM NOTES and question bank


Unit 3 Questions:
ANN:
3 basic Components

1. Connections

2. Types of learning

3. Activation Functions

1. Connections
2. Learning

3. Activation Function
REFER NOTES SHARED ALSO
Question Bank for Unit 3
1. Discuss the differences between regression and classification in practical scenarios. Use
examples from real-world datasets to illustrate how these two types of problems differ.
2. Analyze artificial neural network and how is it represented?
3. Discuss the types of problems suitable for neural network learning.
4. Explain the structure and function of a perceptron.
5. What are the common activation functions used in neural networks?
6. How does the learning rate affect the training of neural networks?
7. Discuss the concept of overfitting in the context of neural networks.
8. Explain the role of hidden layers in neural networks.
9. How can neural networks be used for classification tasks?
10. Describe the process of gradient descent in training neural networks.
11. What is the significance of weight initialization in neural networks?

You might also like