Machine Learning - Lec1
Machine Learning - Lec1
Machine Learning
Arthur Samuel, an early American leader in the field of computer gaming and
artificial intelligence, coined the term “Machine Learning ” in 1959 while at
IBM. He defined machine learning as
“the field of study that gives computers the ability to learn without being
explicitly programmed “.
● Machine learning is programming computers to optimize a performance
criterion using example data or past experience.
● A model is defined up to some parameters, and learning is the execution
of a computer program to optimize the parameters of the model using
the training data or past experience.
● The model may be predictive to make predictions in the future, or
descriptive to gain knowledge from data.
Definition by Tom Mitchell (1998): Machine Learning is the
study of algorithms that
• at some task T
• with experience E.
Unsupervised learning
Semi-supervised learning
Reinforcement learning
Clustering
Divide by similarity
Association
Regression Classification Identify Sequences
Dimensionality R
Linear Regression Logistic Regression
Compress data based
Decision Tree
Types of learning
Supervised Learning: In supervised learning, the algorithm is trained on a
labeled dataset, where both the input data and the correct output are
provided. The goal is to learn a mapping from inputs to outputs, enabling
the algorithm to make predictions on new, unseen data.
● Training Phase: During the training phase, the algorithm learns the relationship
between the input data and the output labels. It adjusts its internal parameters or
model to minimize the error or difference between its predictions and the actual
labels in the training data.
● Testing and Evaluation: Once the model is trained, it is evaluated on a
separate dataset, known as the testing or validation set. The model's
performance is assessed by comparing its predictions to the true labels in
the testing set.
● Prediction: After successful training and evaluation, the model can be
used to make predictions on new, unseen data. These predictions are
based on the learned patterns from the training data.
Types of SML
● Classification: In classification tasks, the goal is to predict a discrete label or category.
Examples include predicting house prices based on features like square footage and
location, forecasting stock prices, and estimating a person's age based on various
attributes.
Supervised learning
Applications
● Recognizing patterns
● Facial identities or facial expressions – Handwritten or
spoken words – Medical images
● Generating patterns: – Generating images or motion sequences
● Recognizing anomalies: – Unusualcredit card transactions –
Unusual patternsof sensor readings in a nuclear power plant
● Prediction: – Future stock prices or currency exchange rates
Algorithms used in SML
● Linear Regression: For regression tasks where the relationship between input
features and the target is assumed to be linear.
● Logistic Regression: For binary or multi-class classification tasks.
● Decision Trees: Tree-based models for both classification and regression tasks.
● Random Forest: An ensemble method that combines multiple decision trees for
improved performance.
● Support Vector Machines (SVM): Used for classification tasks and finding a
hyperplane that best separates classes.
● Neural Networks: Deep learning models with multiple layers of interconnected nodes,
suitable for various tasks, including image recognition and natural language processing.
● K-Nearest Neighbors (K-NN): A simple classification and regression algorithm based on the
similarity of data points.
The choice of algorithm depends on the specific problem, the characteristics of the data, and
the desired performance metrics. In supervised machine learning, the quality and quantity of
labeled data are critical, as they directly impact the model's ability to make accurate
predictions.
Unsupervised Machine Learning
● Unsupervised machine learning is a type of machine learning where the
algorithm is trained on unlabeled data, and its goal is to discover patterns,
structures, or relationships within the data without specific guidance or
labeled output.
● Unlike supervised learning, where the algorithm is given labeled examples
to learn from, unsupervised learning is used when you want the algorithm
to explore and find inherent structures or insights within the data itself.
Clustering:
A clustering problem is where you want to discover the inherent groupings in the data, such as grouping customers by purchasing
behavior.
Association:
An association rule learning problem is where you want to discover rules that describe large portions of your data, such as people
that buy X also tend to buy Y.
Dimensionality Reduction:
● In machine learning classification problems, there are often too many factors
on the basis of which the final classification is done.
● These factors are basically variables called features. The higher the number of
features, the harder it gets to visualize the training set and then work on it.
● Sometimes, most of these features are correlated, and hence redundant. This
is where dimensionality reduction algorithms come into play.
● Dimensionality reduction is the process of reducing the number of random
variables under consideration, by obtaining a set of principal variables.
Key Concepts of USML
● Unlabeled Data: In unsupervised learning, the training data consists of raw input
data without associated output labels. This data could be in the form of text,
images, numerical features, or any other data type.