ml unit 2
ml unit 2
Works: Computational Learning Theory, Occam's Razor Principle and Over fitting Avoidance
Heuristic Search in inductive Learning, Estimating Generalization Errors, Metrics for assessing
regression, Metris for assessing classification.
Supervised Learning:
Supervised learning is a type of machine learning where the algorithm learns from a labeled dataset,
meaning it is provided with input-output pairs to learn a mapping function between the input and
the corresponding output. The goal of supervised learning is to make predictions on new, unseen
data based on the patterns learned from the training dataset.
Key Terminologies:
1. Input features (X): These are the variables or attributes that are used to describe the input data. In
a supervised learning problem, each data point is represented by a set of input features.
2. Target labels (Y): These are the output variables that we want the algorithm to learn to predict.
The goal of the algorithm is to map the input features to the target labels.
3. Training Data: The labeled dataset used to train the supervised learning algorithm. It consists of
input features and their corresponding target labels.
4. Model: The algorithm used to learn the mapping function from the training data. The model tries
to generalize patterns in the training data to make predictions on new, unseen data.
5. Prediction: Once the model is trained, it can be used to predict the target label for new input
features.
1. Regression:
- Regression algorithms are used when the target variable is continuous or numerical.
- The goal is to predict a value within a range, such as predicting the price of a house based on its
features.
2. Classification:
- Classification algorithms are used when the target variable is categorical or belongs to a specific
class or category.
- The goal is to classify data points into predefined classes, such as determining whether an email is
spam or not.
1. Linear Regression:
- A regression algorithm that finds the best-fit straight line to model the relationship between the
input features and the target variable.
- It aims to minimize the error between the predicted values and the actual target values.
2. Logistic Regression:
- A classification algorithm used to model the probability of a data point belonging to a specific
class.
- It uses a logistic function to map the input features to a binary outcome (0 or 1).
3. Decision Trees:
- It creates a tree-like model where each internal node represents a decision based on a feature,
and each leaf node represents the target label.
4. Random Forest:
- An ensemble learning technique that builds multiple decision trees and combines their
predictions to improve accuracy and reduce overfitting.
- A powerful classification algorithm that finds the optimal hyperplane to separate data points
belonging to different classes with the largest margin.
6. Neural Networks:
- They consist of interconnected layers of neurons and are used for complex tasks like image
recognition and natural language processing.
- The training dataset is typically split into two parts: the training set (used to train the model) and
the test set (used to evaluate the model's performance).
2. Training Process:
- The algorithm uses the training set to learn the mapping function by adjusting its internal
parameters based on the input features and their corresponding target labels.
3. Evaluation Metrics:
- For regression tasks, metrics like Mean Squared Error (MSE) or Root Mean Squared Error (RMSE)
are used to measure the error between predicted and actual values.
- For classification tasks, metrics like accuracy, precision, recall, and F1 score are used to evaluate
the model's performance.
- Overfitting occurs when the model performs well on the training data but poorly on unseen data
due to capturing noise or random fluctuations.
- Underfitting occurs when the model is too simple to capture the underlying patterns in the data.
- Cross-validation: Using multiple train-test splits to better estimate the model's performance.
- Regularization: Introducing penalties to limit the complexity of the model and prevent overfitting.
- Feature selection: Removing irrelevant or redundant features from the input data.
- Sentiment Analysis: Determining the sentiment expressed in a piece of text (positive, negative, or
neutral).
- Medical Diagnosis: Predicting the presence or absence of a disease based on patient data.
In summary, supervised learning is a fundamental concept in machine learning that involves training
algorithms on labeled data to make predictions on new, unseen data. It encompasses various
algorithms and techniques that have a wide range of applications across different domains. Proper
evaluation and mitigation of overfitting are crucial for building accurate and reliable models.
Machine Learning (ML) is a subset of artificial intelligence that focuses on the development of
algorithms and statistical models that enable computers to learn and improve their performance on
a specific task through experience. The fundamental idea behind machine learning is to enable
computers to learn from data and make decisions or predictions without being explicitly
programmed for each scenario.
The rationale for adopting machine learning in various applications is based on several key factors:
b) **Adaptability:** Machine learning algorithms can adapt and improve their performance as they
encounter more data, making them suitable for dynamic and evolving environments.
c) **Big Data:** With the exponential growth of data, traditional manual analysis becomes
impractical. Machine learning enables efficient processing and extraction of valuable insights from
vast datasets.
e) **Unstructured Data:** Machine learning can handle unstructured data, such as text, audio, and
images, which is prevalent in today's digital world.
At the core of machine learning is the ability to learn from observations (data). The process involves
three main components:
a) **Data Collection:** Gathering relevant data is the first step in the machine learning process. The
quality, size, and diversity of the data significantly influence the effectiveness of the model.
b) **Feature Extraction:** Once the data is collected, relevant features or attributes need to be
extracted to represent the data in a format suitable for learning. Feature extraction is crucial for
effective pattern recognition and decision-making.
c) **Model Building:** After collecting and preprocessing the data, a machine learning model is
constructed using algorithms. The model's architecture depends on the type of learning, such as
supervised, unsupervised, or reinforcement learning.
Supervised learning is a type of machine learning where the model is trained on labeled data,
meaning each input example is associated with the correct output or label. The learning algorithm
tries to learn the mapping between inputs and outputs by minimizing the prediction errors.
a) **Training Data:** In supervised learning, the training dataset consists of input-output pairs,
where the input is the feature vector, and the output is the corresponding target label.
b) **Regression vs. Classification:** Supervised learning can be further divided into regression tasks
(predicting continuous values) and classification tasks (predicting discrete labels).
c) **Popular Algorithms:** Some popular supervised learning algorithms include Linear Regression,
Decision Trees, Random Forests, Support Vector Machines (SVM), and Neural Networks.
Unsupervised learning, in contrast to supervised learning, involves training the model on unlabeled
data. The algorithm aims to find hidden patterns or structure within the data without explicit
guidance.
a) **Clustering:** Clustering is a common task in unsupervised learning, where the algorithm groups
similar data points into clusters based on their feature similarities.
Reinforcement learning is a paradigm in which an agent learns to make decisions by interacting with
an environment. The agent receives feedback in the form of rewards or penalties based on its actions
and aims to learn a strategy that maximizes the cumulative reward over time.
a) **Markov Decision Process (MDP):** Reinforcement learning problems are often formulated as
MDPs, which describe the environment, actions, rewards, and the transition probabilities.
b) **Exploration vs. Exploitation:** One of the key challenges in reinforcement learning is balancing
exploration (trying new actions) and exploitation (leveraging known actions) to optimize long-term
rewards.
**Conclusion:**
Machine learning's rationale and basics revolve around its ability to learn from observations, making
it a powerful tool in various domains. Whether supervised, unsupervised, or reinforcement learning,
these algorithms enable computers to learn patterns, make predictions, and automate decision-
making processes, making them a cornerstone of modern AI applications. As technology advances
and data availability increases, the potential for machine learning to drive innovation and problem-
solving continues to grow.
Machine Learning (ML) is a subset of artificial intelligence that empowers computers to learn and
improve their performance on a specific task without being explicitly programmed. The fundamental
goal of ML is to develop algorithms that can generalize from the data and make predictions or
decisions based on new, unseen inputs.
**a) Data Bias:** Occurs when the training data used to build the ML model is unrepresentative of
the real-world population, leading to skewed predictions.
**b) Algorithmic Bias:** Arises from the design and choice of algorithms, which may favor certain
groups or attributes over others due to inherent assumptions.
**c) Human Bias:** Can be introduced when human annotators label the training data or when
subjective decisions affect the model's training process.
**a) Discrimination:** Biased models may discriminate against certain demographic groups, leading
to unfair treatment or opportunities.
**b) Unreliable Decisions:** Bias can reduce the accuracy and reliability of the model's predictions,
affecting the overall performance.
**c) Lack of Generalization:** A biased model may perform well on the training data but fail to
generalize to unseen data, leading to poor performance in real-world scenarios.
**d) Negative Social Impact:** Biased AI systems can perpetuate existing societal inequalities and
exacerbate systemic issues.
**a) Diverse and Representative Data Collection:** Ensuring that the training dataset is diverse and
representative of the real-world population can help reduce data bias.
**b) Bias Detection and Evaluation:** Developing metrics and methods to detect and quantify bias in
ML models is crucial to understanding its impact.
**c) Fairness-aware Algorithms:** Researchers are working on developing algorithms that explicitly
consider fairness constraints during model training.
**d) Transparent and Explainable Models:** Building interpretable models allows stakeholders to
understand the factors contributing to predictions and identify potential biases.
**e) Continuous Monitoring and Updating:** Regularly monitoring the model's performance in real-
world applications and updating it as needed can help address new biases that may emerge.
**a) Representation Power:** ML models, such as deep neural networks, have a high capacity to
learn complex patterns and representations from data.
**b) Feature Learning:** ML algorithms can automatically extract relevant features from raw data,
reducing the need for manual feature engineering.
**c) Adaptability:** ML models can adapt to changing data distributions and learn from new
examples, making them versatile in dynamic environments.
**d) Generalization:** Learning from data enables ML models to generalize well to unseen instances,
improving their applicability.
**e) Scalability:** Modern ML algorithms are scalable, enabling them to process large datasets and
handle complex tasks.
In conclusion, understanding and addressing bias are critical to building ethical and effective machine
learning systems. The success of machine learning lies in its ability to learn patterns from data,
generalize to new situations, and adapt to changes, making it a powerful tool in various domains
when used responsibly and with awareness of potential biases.
**Title: Computational Learning Theory**
**Introduction:**
Computational Learning Theory is a subfield of machine learning that focuses on studying the
theoretical foundations of learning algorithms and their computational capabilities. It aims to
understand the fundamental properties of learning algorithms, including their efficiency, sample
complexity, and generalization performance. The main goal is to derive mathematical bounds on the
performance of learning algorithms and gain insights into their capabilities and limitations. In this
overview, we'll cover the key concepts and components of Computational Learning Theory.
- **Input Space (X):** The set of all possible input instances, typically represented as feature vectors
in a high-dimensional space.
- **Output Space (Y):** The set of all possible output labels or classes associated with the input
instances.
- **Hypothesis Space (H):** The set of all possible functions that the learning algorithm can learn.
Each function in H represents a potential hypothesis or model.
- **Target Concept (c):** The true, unknown function that the learning algorithm is trying to
approximate. It maps input instances to their correct output labels.
- **Training Data (D):** A labeled dataset containing examples of input-output pairs (x, y) drawn
from the true but unknown distribution D over X x Y.
In Computational Learning Theory, learning often revolves around the concept of Empirical Risk
Minimization (ERM). ERM is a principle that suggests selecting the hypothesis that minimizes the
empirical risk or the training error. The empirical risk of a hypothesis h is the fraction of training
examples that h misclassifies. Formally, it is defined as:
```
```
The ERM principle assumes that the training data is representative of the underlying distribution D,
allowing the learning algorithm to approximate the target concept effectively.
**3. Bias and Variance Trade-off:**
The concept of Bias and Variance is crucial in understanding the generalization performance of
learning algorithms. Bias refers to the error introduced by approximating a complex target concept
with a simplified hypothesis space. Variance, on the other hand, refers to the sensitivity of the
learning algorithm to small changes in the training data.
- High Bias: If the hypothesis space is too simple (low model complexity), the algorithm may have
high bias, leading to underfitting and poor performance on both training and test data.
- High Variance: If the hypothesis space is too complex (high model complexity), the algorithm may
have high variance, leading to overfitting on the training data but poor performance on unseen test
data.
Finding the right balance between bias and variance is essential for achieving good generalization
performance.
- Epsilon (ε): The error bound, representing the maximum allowed difference between the
hypothesis and the target concept.
- Delta (δ): The confidence level, representing the probability that the learned hypothesis will be
approximately correct.
A hypothesis space H is PAC-learnable if the number of training examples required to achieve PAC
guarantees is polynomial in 1/ε and 1/δ.
Sample complexity refers to the number of training examples required for a learning algorithm to
achieve a certain level of accuracy. The Vapnik-Chervonenkis (VC) dimension is a measure of the
capacity or complexity of a hypothesis space. A hypothesis space with a low VC dimension is capable
of fitting more complex functions, while a high VC dimension indicates a more restricted space.
The VC dimension provides a theoretical basis for understanding the trade-off between the capacity
of a hypothesis space and the number of training examples needed to achieve good generalization.
**Conclusion:**
Computational Learning Theory is a crucial branch of machine learning that provides a rigorous
mathematical foundation for understanding the capabilities and limitations of learning algorithms. By
studying the sample complexity, generalization bounds, and the trade-off between bias and variance,
researchers can gain insights into the behavior of learning algorithms and develop more robust and
efficient models for real-world applications.
**Machine Learning Topic: Occam's Razor Principle and Overfitting Avoidance Heuristic Search in
Inductive Learning**
**Introduction:**
Occam's Razor, also known as the principle of parsimony, is a fundamental concept in machine
learning and scientific reasoning. Named after the 14th-century philosopher William of Ockham, the
principle suggests that among competing hypotheses, the simplest one should be preferred until
evidence indicates otherwise. In the context of machine learning, Occam's Razor advocates selecting
the simplest model that adequately explains the data.
**Explanation:**
When faced with multiple models that fit the data equally well, Occam's Razor advises choosing the
model with the fewest assumptions or parameters. The rationale behind this principle lies in the idea
that complex models might fit the training data well but could struggle to generalize to unseen data.
In contrast, simpler models are less likely to overfit and are more generalizable.
Occam's Razor is often employed in model selection and feature engineering. In model selection, it
guides the choice of algorithms and architectures with an emphasis on simplicity and interpretability.
For example, linear regression is preferred over a complex ensemble model if both yield comparable
results. In feature engineering, Occam's Razor encourages using only the most relevant features,
avoiding unnecessary complexities in the dataset.
**Benefits:**
1. Improved Generalization: Simple models are less prone to overfitting, leading to better
performance on unseen data.
2. Enhanced Interpretability: Simpler models are easier to understand and interpret, making them
more useful for decision-making.
3. Lower Computational Costs: Simple models typically require fewer resources, making them faster
to train and deploy.
**Introduction:**
Overfitting is a common problem in machine learning, where a model learns to memorize the
training data rather than capturing the underlying patterns. It occurs when a model becomes
excessively complex, fitting not only the signal but also the noise in the data. Overfitting leads to
poor generalization, meaning the model performs poorly on new, unseen data.
**Explanation:**
To avoid overfitting, various heuristic search techniques are employed during inductive learning.
These techniques aim to strike a balance between model complexity and performance on the
training data. The goal is to find a model that can generalize well to new data.
**1. Cross-Validation:**
Cross-validation involves dividing the training data into multiple subsets (folds). The model is trained
on different combinations of these subsets and validated on the remaining fold. This process is
repeated several times, and the average performance is used to evaluate the model. Cross-validation
helps in estimating how well the model will generalize to unseen data.
**2. Regularization:**
Regularization is a technique that introduces a penalty term to the model's objective function. This
penalty discourages the model from learning overly complex patterns. L1 and L2 regularization are
commonly used, and they add a penalty based on the absolute and squared values of the model
parameters, respectively.
Early stopping involves monitoring the model's performance on a validation set during training. If the
performance stops improving or starts degrading, training is halted to prevent overfitting. This
technique ensures that the model is not trained for too many epochs, which could lead to overfitting.
**4. Feature Selection:**
Feature selection involves choosing the most relevant features and discarding irrelevant or
redundant ones. Reducing the number of features can help avoid overfitting, especially when dealing
with high-dimensional datasets.
**Benefits:**
1. Improved Generalization: By avoiding overfitting, the model performs better on new, unseen data.
2. Robustness: Models trained using overfitting avoidance techniques are more robust and reliable.
3. Resource Efficiency: Avoiding overfitting leads to models that require fewer resources, making
them more efficient for deployment.
In conclusion, Occam's Razor Principle and Overfitting Avoidance Heuristic Search are essential
concepts in machine learning. Occam's Razor encourages simplicity and generalizability in model
selection, while overfitting avoidance techniques ensure that models are robust and capable of
performing well on new data. Understanding and applying these principles are crucial for developing
effective and reliable machine learning models.
In machine learning, the ultimate goal is to create models that can make accurate predictions on
new, unseen data. Generalization refers to the ability of a machine learning model to perform well on
such unseen data, i.e., data it has not been trained on. Estimating generalization errors is a critical
aspect of model evaluation as it helps us understand how well a model is likely to perform in real-
world scenarios.
When building a machine learning model, it's essential to divide the available data into three sets:
training set, validation set, and test set.
- **Training Set:** This is the largest portion of the data and is used to train the model. The model
learns from the patterns and relationships in this data.
- **Validation Set:** After training the model, it is essential to assess its performance on data it has
not seen before. The validation set is used during the training phase to fine-tune hyperparameters
and make decisions about the model architecture. It helps prevent overfitting, where the model
becomes too specialized on the training data and fails to generalize to new data.
- **Test Set:** Once the model has been fully trained and tuned using the validation set, the final
evaluation is performed on the test set. This set should not be used during model development, as it
is solely used to estimate the model's generalization error.
**2. Cross-Validation:**
- **k-Fold Cross-Validation:** In k-fold cross-validation, the data is divided into k subsets (folds). The
model is trained and validated k times, each time using a different fold as the validation set and the
remaining k-1 folds as the training set.
When estimating generalization error, it's essential to understand the bias-variance tradeoff. A model
with high bias (underfitting) tends to oversimplify the data, leading to poor performance on both
training and unseen data. On the other hand, a model with high variance (overfitting) memorizes the
training data but fails to generalize to new data.
- **Bias:** Bias is the error introduced by approximating a real problem with a simplified model.
High bias can lead to the model being too rigid and unable to capture complex patterns in the data.
- **Variance:** Variance is the sensitivity of the model to fluctuations in the training data. High
variance can result in the model being too flexible and fitting noise in the training data rather than
learning the underlying patterns.
**4. Regularization:**
Learning curves are plots that show the performance of a model on both the training and validation
sets as a function of the training set size. They provide valuable insights into the model's ability to
generalize based on the amount of training data available.
- **Underfitting:** In the early stages of learning, both training and validation errors are high,
indicating that the model is underfitting and requires more data or complexity.
- **Optimal Fit:** As the model learns, the validation error decreases, and the training error
stabilizes. This is the point where the model achieves the best tradeoff between bias and variance
and is considered the optimal fit.
- **Overfitting:** If the model continues to train, the validation error may start to increase, while the
training error continues to decrease. This is a sign of overfitting, where the model becomes too
specialized in the training data.
**Conclusion:**
Estimating generalization errors is crucial in machine learning to build models that can perform well
on unseen data. Techniques like cross-validation, regularization, and learning curves help in achieving
a balance between bias and variance, leading to models that generalize effectively. By using proper
evaluation methodologies and optimizing hyperparameters, we can develop robust machine learning
models that perform well in real-world scenarios.
In regression tasks, the primary goal is to predict a continuous numerical value, such as price,
temperature, or sales. To evaluate the performance of a regression model, various metrics are used
to assess how well the model's predictions align with the actual target values. Below are some
commonly used metrics for evaluating regression models:
MSE is one of the most widely used metrics for regression tasks. It measures the average squared
difference between the predicted values and the true target values. The formula for MSE is as
follows:
```
```
Where:
MSE is sensitive to outliers since it squares the differences between predictions and true values. A
higher MSE indicates worse model performance, with 0 being the best possible score.
RMSE is the square root of MSE and provides the error in the same units as the target variable. It is
useful for understanding the average magnitude of the error. The formula for RMSE is:
```
RMSE = √(MSE)
```
RMSE penalizes large errors more than small ones, making it particularly valuable when significant
errors are costly.
MAE measures the average absolute difference between predicted values and true values, ignoring
the direction of the errors. It is less sensitive to outliers than MSE. The formula for MAE is as follows:
```
```
MAE provides a more interpretable metric since it is in the same units as the target variable. Like
MSE, a lower MAE indicates better model performance.
### 4. R-squared (R^2) Score:
R-squared, also known as the coefficient of determination, represents the proportion of the variance
in the target variable that is predictable from the independent features used in the model. The value
of R-squared ranges from 0 to 1, with 1 indicating that the model explains all the variability in the
target variable. The formula for R-squared is:
```
```
Where:
- SS_res is the sum of squares of the residuals (the differences between true and predicted values).
- SS_tot is the total sum of squares (the differences between true values and the mean of the target
variable).
A higher R-squared value suggests a better fit of the model to the data. However, R-squared may not
be an ideal metric for complex models or when the dataset has a high level of noise.
MSLE measures the average squared difference between the natural logarithms of the predicted
values and the true target values. It is particularly useful when the target values have a wide range.
The formula for MSLE is as follows:
```
```
MSLE can prevent extremely large errors from dominating the metric and is commonly used in tasks
where the target values span several orders of magnitude.
```
```
Max Error calculates the maximum difference between the true target values and the predicted
values. It represents the worst-case scenario error of the model. The formula for Max Error is:
```
```
Max Error is useful for identifying potential outliers or cases where the model performs poorly.
MPE measures the percentage difference between the true target values and the predicted values,
providing a relative error metric. The formula for MPE is as follows:
```
```
MPE can be helpful when you want to understand the average relative error of the model's
predictions.
```
```
```
```
COD is commonly used in multiple regression analysis to evaluate the goodness of fit of the model.
Keep in mind that the choice of the appropriate metric depends on the specific regression problem
and the characteristics of the dataset. For instance, MSE and RMSE are suitable for scenarios where
large errors should be penalized, while MAE is more robust to outliers. R-squared provides a measure
of the overall goodness of fit, but it may not be sufficient on its own, and other metrics can be used
to gain a more comprehensive understanding of model performance. Always consider the context
and requirements of the problem at hand when selecting evaluation metrics for regression models.
In machine learning, classification is a common task where the goal is to assign input data to one of
several predefined categories or classes. Evaluating the performance of a classification model is
crucial to understanding how well it can generalize to new, unseen data. Various evaluation metrics
are used to assess the classification model's effectiveness in making accurate predictions. In this
context, we will explore some of the most common classification metrics.
### 1. Confusion Matrix:
The confusion matrix is a tabular representation that summarizes the model's performance on a
classification problem. It provides a comprehensive view of the true positive (TP), true negative (TN),
false positive (FP), and false negative (FN) predictions.
```
Actual Positive TP FN
Actual Negative FP TN
```
- False Positive (FP): The number of negative instances incorrectly classified as positive.
- False Negative (FN): The number of positive instances incorrectly classified as negative.
### 2. Accuracy:
Accuracy is one of the most straightforward metrics and is often used to measure the overall
performance of a classification model. It calculates the proportion of correctly classified instances
over the total number of instances in the dataset.
```
```
While accuracy is essential, it may not be the best metric to use, especially when dealing with
imbalanced datasets, where one class heavily outweighs the others. In such cases, accuracy can be
misleading.
### 3. Precision:
Precision measures the proportion of true positive predictions out of all positive predictions made by
the model. It helps assess the model's ability to avoid false positives.
```
```
A high precision value indicates that when the model predicts a positive instance, it is likely to be
correct.
Recall calculates the proportion of true positive predictions out of all actual positive instances in the
dataset. It measures the model's ability to find all the positive instances.
```
```
A high recall value indicates that the model can effectively identify positive instances.
### 5. F1 Score:
The F1 score is the harmonic mean of precision and recall, providing a balance between the two
metrics. It is especially useful when there is an uneven class distribution.
```
```
A perfect F1 score is 1, while the worst score is 0.
Specificity calculates the proportion of true negative predictions out of all actual negative instances
in the dataset. It measures the model's ability to avoid false positives for the negative class.
```
```
The ROC curve is a graphical representation of the model's performance across different
classification thresholds. The AUC-ROC metric measures the area under this curve, summarizing the
model's ability to discriminate between positive and negative instances.
AUC-ROC values range from 0 to 1, with 0.5 indicating random guessing, and 1 representing a perfect
classifier.
Similar to the AUC-ROC, the AUC-PR metric measures the area under the precision-recall curve. It is
especially useful when dealing with imbalanced datasets, as it focuses on the trade-off between
precision and recall.
Cohen's Kappa is a statistical measure that assesses the agreement between the model's predictions
and the actual labels, taking into account the agreement that could be expected by chance.
It is particularly useful when dealing with imbalanced datasets and can be considered a more robust
alternative to accuracy.
### Conclusion:
Assessing the performance of classification models using appropriate metrics is essential for
understanding their strengths and weaknesses. Depending on the specific requirements of the
problem, different metrics may be more relevant. It is crucial to choose the right evaluation metrics
based on the problem at hand and the characteristics of the dataset to make informed decisions
about the model's effectiveness.
Unit III: Statistical Learning: Machine Learning and Inferential Statistical Analysis, Descriptive
Statistics in learning techniques, Bayesian Reasoning A probabilistic approach to inference, K-Nearest
Neighbor Classifier. Discriminant functions and regression functions, Linear Regression with Least
Square Error Criterion, Logistic Regression for Classification Tasks, Fisher's Linear Discriminant and
Thresholding for Classification, Minimum Description Length Principle.