0% found this document useful (0 votes)
8 views

EPS DL Handout1 Introduction Compressed

The document provides an overview of key concepts in Artificial Intelligence (AI), including Machine Learning (ML), Deep Learning, Computer Vision, and Natural Language Processing (NLP). It discusses the importance of datasets, data preprocessing, and various machine learning libraries such as NumPy, Pandas, and TensorFlow. Additionally, it highlights the significance of generative AI and the need for clean, organized data to improve machine learning model performance.

Uploaded by

nischalagummadi
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views

EPS DL Handout1 Introduction Compressed

The document provides an overview of key concepts in Artificial Intelligence (AI), including Machine Learning (ML), Deep Learning, Computer Vision, and Natural Language Processing (NLP). It discusses the importance of datasets, data preprocessing, and various machine learning libraries such as NumPy, Pandas, and TensorFlow. Additionally, it highlights the significance of generative AI and the need for clean, organized data to improve machine learning model performance.

Uploaded by

nischalagummadi
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 46

Deep Learning

SESSION-1
BY
ASHA
Artificial Intelligence
• Artificial Intelligence refers to the development of computer
systems that can perform tasks that typically require human
intelligence.
Machine Learning
• Machine Learning involves algorithms and statistical models that

enable computers to improve their performance on a specific task

without explicit programming.

• It focuses on pattern recognition and learning from data.

• Machine Learning is a subset of Artificial Intelligence.


Deep Learning
• Deep learning is a subset of Machine Learning that involves

neural networks with multiple layers (deep neural networks).

• These networks can automatically learn to extract features

from data and make complex decisions based on large amounts of

data.
Computer Vision (CV) is a field of artificial intelligence that enables
computers to interpret and understand the visual world. Using
digital images from cameras and videos and deep learning models,
machines can accurately identify and classify objects, and then
react to what they "see."
Natural language processing

• Natural Language Processing (NLP) is a subfield of artificial


intelligence that focuses on the interaction between computers and
humans through natural language.
• The ultimate goal of NLP is to enable computers to understand,
interpret, and generate human languages in a valuable way.
Large Language Model
• Large language models are advanced AI models trained on vast
amounts of text data, enabling them to understand and generate
human-like language.
• Virtual assistants like Siri or Alexa utilize large language
models to understand and respond to natural language queries.
Generative AI
• Generative AI refers to Artificial Intelligence systems that are
capable of creating new content such as text, images, or music.
• These systems learn from existing data patterns and generate
fresh, original content.
GENERATIVE AI
Machine learning

In 1959, Arthur Samuel, a computer scientist who pioneered the study of artificial
intelligence, described machine learning as “The study that gives computers the ability to
learn.”
Machine learning is a subset of AI, which enables the machine to automatically learn from
data, improve performance from past experiences, and make predictions.
Machine learning is a subset of artificial intelligence that aims to
mimic how human beings learn by using data.
A more technical definition given by Tom M. Mitchell’s (1997) : “A computer program
is said to learn from experience E with respect to some class of tasks T and
performance measure P, if its performance at tasks in T, as measured by P, improves
with experience E.”
How Machine learning works?
How Machine learning works?
DATASET
• A dataset is a collection of data in which data is arranged in some order. A
dataset can contain any data from a series of an array to a database table.

• A tabular dataset can be understood as a database table or matrix, where


each column corresponds to a particular variable, and each row
corresponds to the fields of the dataset. The most supported file type for a
tabular dataset is "Comma Separated File," or CSV.
Need of Dataset
To work with machine learning projects, we need a huge amount of data.
Collecting and preparing the dataset is one of the most crucial parts while
creating an ML/AI project.
Popular sources for Machine Learning datasets
• Kaggle Datasets
• UCI Machine Learning Repository
• Datasets via AWS
• Google's Dataset Search Engine
• Microsoft Datasets
DATA PREPROCESSING
Data pre-processing is a process of cleaning the raw data i.e. the
data is collected in the real world as most of the real-world data
is messy, some of these types of data are:

1. Missing data

2. Noisy data

3. Inconsistent data
Why is Data Preprocessing important?
The majority of the real-world datasets for machine learning are highly
susceptible to be missing, inconsistent, and noisy.

• Data Processing is, therefore, important to improve the overall data quality.

• Duplicate or missing values may give an incorrect view of the overall


statistics of data

• Outliers and incOnsistent data pOints Often tend tO disturb the mOdel’s
overall learning, leading to false predictions.
Data Reduction
• The size of the dataset in a data warehouse can be too large to
be handled by data analysis and data mining algorithms.

• One possible solution is to obtain a reduced representation of the


dataset that is much smaller in volume but produces the same
quality of analytical results.
• Handling Missing Values: Techniques include removing instances with missing

values, imputing missing values with the mean, median, or mode, or using

advanced techniques like KNN imputation.

• Removing Duplicates: Identifying and removing duplicate instances to ensure

the dataset is clean.

• Feature Scaling:

• Normalization: Rescaling the features to a range of [0, 1].

• Standardization: Rescaling the features to have a mean of 0 and a

standard deviation of 1.
Encoding Categorical Data:

•One-Hot Encoding: Converting categorical variables into binary vectors.

•Label Encoding: Converting categorical variables into integer values.

Splitting Data:

•Dividing the dataset into training and testing sets to evaluate the model's

performance.
Machine learning types ?
ALGORITHM DEVELOPMENT STEPS
Basic terminology
Features and Labels:
•Features: The input variables (independent variables) used by the model to make
predictions.
•Labels: The output variable (dependent variable) that the model is trying to predict.
Training and Testing:
•Training Set: A subset of the dataset used to train the model.
•Testing Set: A subset of the dataset used to evaluate the model's performance.
Overfitting and Underfitting:
•Overfitting: When the model performs well on the training data but poorly on the testing
data because it has learned noise and details from the training data.
•Underfitting: When the model performs poorly on both the training and testing data
because it is too simple to capture the underlying patterns in the data.
What libraries do we use for
Machine learning?
NumPy
• NumPy – Numerical python is a very popular python library for array and
matrix processing, with the help of a large collection of high-level
mathematical functions.
• It is very useful for fundamental scientific computations in Machine
Learning.
Pandas
• Pandas-Panel data is a popular Python library for data analysis.
• It is not directly related to Machine Learning but the dataset must be
prepared before training for which Pandas are useful as it is developed
specifically for data extraction and preparation.
• It provides data structures and wide variety tools for data analysis. It
provides many inbuilt methods for filtering, combining and grouping
data.
Matplotlib
• Matplotlib is a Python library for data visualization. Like Pandas, it is
not directly related to Machine Learning. It is needed when a
programmer wants to visualize the patterns in the data.

• A module named pyplot makes it easy for programmers for plotting .


Scikit-learn
• Scikit-learn is one of the most popular ML libraries for classical ML
algorithms.

• Scikit-learn supports most of the supervised and unsupervised


learning algorithms
TensorFlow
• TensorFlow is a popular open-source library for high performance
numerical computation.

• It can train and run deep neural networks that can be used to develop
several AI applications. TensorFlow is widely used in the field of deep
learning research and application.
THANK YOU

You might also like