SlideShare a Scribd company logo
Introduction to
DATA SCIENCE
Introduction to Data Science
Introduction to Data Science
Introduction to Data Science
Challenges deep-dive
Why the Hype Around
Data Science?
● The demand for data scientists will soar by 28% by 2023
● Data scientist roles have grown over 650% since 2012, but
currently, 35,000 people in the US have data science skills,
while hundreds of companies are hiring for those roles.
● Software engineering is a common starting point for
professionals who are in the top five fasting growing jobs today.
● Data Science gives you career flexibility
Who are Data Scientist?
Introduction to Data Science
Challenges deep-dive
What is Machine
Learning ?
Machine learning teaches computers to do what comes naturally to
humans and animals: learn from experience. Machine learning
algorithms use computational methods to “learn” information directly
from data without relying on a predetermined equation as a model.
The algorithms adaptively improve their performance as the number
of samples available for learning increases.
Challenges deep-dive
A Definition
A computer program is said to learn from experience E with
respect to some task T and some performance measure P if its
performance on T, as measured by P, improves with experience E.
-Tom Mitchell
Challenges deep-dive
A Small Question
Suppose we feed a learning algorithm a lot of historical weather
data, and have it learn to predict weather. In this setting, what is
T,P,E?
Introduction to Data Science
More Data,
More Questions,
Better Answers
Challenges deep-dive
Machine learning teaches computers to do what comes naturally to
humans and animals: learn from experience. Machine learning
algorithms use computational methods to “learn” information directly
from data without relying on a predetermined equation as a model.
The algorithms adaptively improve their performance as the number
of samples available for learning increases.
Real World
Applications
With the rise in big data, machine learning has become particularly
important for solving problems in areas like these:
● Image processing and computer vision,for face recognition,
motion detection, and object detection
● Computational biology, for tumor detection, drug discovery, and
DNA sequencing
● Energy production, for price and load forecasting
● Automotive, aerospace, and manufacturing, for predictive
maintenance
● Natural language processing
Challenges deep-dive
How Machine
Learning Works
Machine learning uses two types of techniques:
● Supervised learning, which trains a model on known input and
output data so that it can predict future outputs
● Unsupervised learning, which finds hidden patterns or intrinsic
structures in input data.
Machine Learning
Techniques
Challenges deep-dive
Supervised
Learning
The aim of supervised machine learning is to build a model that
makes predictions based on evidence in the presence of
uncertainty. A supervised learning algorithm takes a known set of
input data and known responses to the data (output) and trains a
model to generate reasonable predictions for the response to new
data
Classification - predict discrete responses
Classification models classify input data into categories.for
example, whether an email is genuine or spam, or whether a tumor
is cancerous or benign.
Regression - predict continuous responses
for example, changes in temperature or fluctuations in power
demand. Typical applications include electricity load forecasting and
algorithmic trading.
Challenges deep-dive
Unsupervised
Learning
Unsupervised learning finds hidden patterns or intrinsic structures in
data. It is used to draw inferences from dataset consisting of input
data without labeled responses.
Clustering is the most common unsupervised learning technique. It
is used for exploratory data analysis to find hidden patterns or
groupings in data.Applications for clustering include gene sequence
analysis,market research, and object recognition
Knowledge Test
Which of the following would you apply supervised learning to?
1. Given genetic (DNA) data from a person, predict the odds of him/her developing
diabetes over the next 10 years.
2. Given a large dataset of medical records from patients suffering from heart
disease, try to learn whether there might be different clusters of such patients for
which we might tailor separate treatments.
3. Given data on how 1000 medical patients respond to an experimental drug (such
as effectiveness of the treatment, side effects, etc.), discover whether there are
different categories or "types" of patients in terms of how they respond to the
drug, and if so what these categories are.
4. Have a computer examine an audio clip of a piece of music, and classify whether
or not there are vocals (i.e., a human voice singing) in that audio clip, or if it is a
clip of only musical instruments (and no vocals).
Knowledge Test
Which of the following questions can be answered using a
classification algorithm?
1. How does the exchange rate depend on the GDP?
2. Does a document contain the handwritten letter S?
3. How can I group supermarket products using purchase
frequency?
Knowledge Test
1. Suppose you are working on weather prediction, and you
would like to predict whether or not it will be raining at 5pm
tomorrow. You want to use a learning algorithm for this.Would
you treat this as a classification or a regression problem?
2. Suppose you are working on stock market prediction. You
would like to predict whether or not a certain company will
declare bankruptcy within the next 7 days (by training on data
of similar companies that had previously been at risk of
bankruptcy). Would you treat this as a classification or a
regression problem?
How Do You
Decide Which
Algorithm
to Use?
Choosing the right algorithm can seem overwhelming
There are dozens of supervised and unsupervised machine
learning algorithms, and each takes a different approach to
learning.
There is no best method or one size fits all. Finding the right
algorithm is partly just trial and error
But algorithm selection also depends on the size and type of data
you’re working with, the insights you want to get from the data, and
how those insights will be used.
Two - Class Classification
Multi - Class Classification
Anomaly Detection
Regression
Clustering
Challenges deep-dive
When should we use
Machine Learning
Consider using machine learning when you have a complex task or
problem involving a large amount of data and lots of variables, but
no existing formula or equation.
Introduction to Data Science
Knowledge Test
Have a look at the statements below and identify the one which
is not a machine learning problem
1. Given a viewer's shopping habits, recommend a product to
purchase the next time she visits your website.
2. Given the symptoms of a patient, identify her illness.
3. Predict the USD/EUR exchange rate for February 2023.
4. Compute the mean wage of 10 employees for your company.
Knowledge Test
Which of the following statements uses a machine learning
model?
1. Determine whether an incoming email is spam or not
2. Obtain the name of last year's FIFIA Ballon d’Or champion
3. Automatically tagging your new Facebook photos
4. Select the student with the highest grade on a statistics course
Getting
Started
Challenges deep-dive
There is NO
Straight Line
With machine learning there’s rarely a straight line from start to
finish. You’ll find yourself constantly iterating and trying different
ideas and approaches
Challenges deep-dive
Machine learning teaches computers to do what comes naturally to
humans and animals: learn from experience. Machine learning
algorithms use computational methods to “learn” information directly
from data without relying on a predetermined equation as a model.
The algorithms adaptively improve their performance as the number
of samples available for learning increases.
Machine Learning
Challenges
● Data comes in all shapes and sizes
● Preprocessing your data might require specialized knowledge
and tools
● It takes time to find the best model to fit the data.
Challenges deep-dive
Machine learning teaches computers to do what comes naturally to
humans and animals: learn from experience. Machine learning
algorithms use computational methods to “learn” information directly
from data without relying on a predetermined equation as a model.
The algorithms adaptively improve their performance as the number
of samples available for learning increases.
Questions to Ask
Before Starting
Every machine learning workflow begins with three questions:
● What kind of data are you working with?
● What insights do you want to get from it?
● How and where will those insights be applied?
Your answers to these questions help you decide whether to use
supervised or unsupervised learning.
Challenges deep-dive
Machine learning teaches computers to do what comes naturally to
humans and animals: learn from experience. Machine learning
algorithms use computational methods to “learn” information directly
from data without relying on a predetermined equation as a model.
The algorithms adaptively improve their performance as the number
of samples available for learning increases.
Data Science -
Five Questions
There are only five questions that data science answers:
● Is this A or B?
● Is this weird?
● How much – or – How many?
● How is this organized?
● What should I do next?
Knowledge Test
Which of the following questions can be answered using a
classification algorithm?
1. How does the exchange rate depend on the GDP?
2. Does a document contain the handwritten letter S?
3. How can I group supermarket products using purchase
frequency?
Introduction to Data Science
Workflow at a Glance
Challenges deep-dive
Machine learning teaches computers to do what comes naturally to
humans and animals: learn from experience. Machine learning
algorithms use computational methods to “learn” information directly
from data without relying on a predetermined equation as a model.
The algorithms adaptively improve their performance as the number
of samples available for learning increases.
Step 1 -
Load the Data
We store the labeled data sets in a text file. A flat file format such as
text or CSV is easy to work with and makes it straightforward to
import data.
Machine learning algorithms aren’t smart enough to tell the
difference between noise and valuable information. Before using the
data for training, we need to make sure it’s clean and complete
Challenges deep-dive
Machine learning teaches computers to do what comes naturally to
humans and animals: learn from experience. Machine learning
algorithms use computational methods to “learn” information directly
from data without relying on a predetermined equation as a model.
The algorithms adaptively improve their performance as the number
of samples available for learning increases.
Step 2 -
Preprocess the Data
To preprocess the data we do the following:
● Look for outliers–data points that lie outside the rest of the data
● Check for missing values
● Divide the data into two sets
○ We save part of the data for testing (the test set) and use
the rest (the training set) to build models. This is referred
to as holdout, and is a useful cross-validation technique
Challenges deep-dive
Machine learning teaches computers to do what comes naturally to
humans and animals: learn from experience. Machine learning
algorithms use computational methods to “learn” information directly
from data without relying on a predetermined equation as a model.
The algorithms adaptively improve their performance as the number
of samples available for learning increases.
Step 3 -
Derive Features
Deriving features (also known as feature engineering or feature
extraction) turns raw data into information that a machine learning
algorithm can use.
Use feature selection to:
• Improve the accuracy of a machine learning algorithm
• Boost model performance for high-dimensional data sets
• Improve model interpretability
• Prevent overfitting
Challenges deep-dive
Machine learning teaches computers to do what comes naturally to
humans and animals: learn from experience. Machine learning
algorithms use computational methods to “learn” information directly
from data without relying on a predetermined equation as a model.
The algorithms adaptively improve their performance as the number
of samples available for learning increases.
Step 4 -
Build and Train Model
● The predefined algorithms and the test data are used for
building the model.
● The training data is used to train and evaluate the model
Challenges deep-dive
Machine learning teaches computers to do what comes naturally to
humans and animals: learn from experience. Machine learning
algorithms use computational methods to “learn” information directly
from data without relying on a predetermined equation as a model.
The algorithms adaptively improve their performance as the number
of samples available for learning increases.
Step 5 -
Improve the Model
Improving a model can take two different directions: make the
model simpler or add complexity.
Simplify - reduce the number of features
Add Complexity - make it more fine-tuned
Simplify
Popular feature reduction techniques include:
● Correlation matrix – shows the relationship between
variables, so that variables (or features) that are not highly
correlated can be removed.
● Principal component analysis (PCA) - eliminates redundancy
by finding a combination of features that captures key
distinctions between the original features and brings out strong
patterns in the dataset.
● Sequential feature reduction – reduces features iteratively on
the model until there is no improvement in performance
Add Complexity
● Use model combination – merge multiple simpler models into
a larger model that is better able to represent the trends in the
data than any of the simpler models could on their own.
● Add more data sources
TO DO
● Getting Started
● Familiarize with Maths and
Algorithms
● Select the Infrastructure or
Tool
● Create your profile and
participate in competition
Christy Abraham Joy
Email - christyabrahamjoy@gmail.com
Mob - +91 94000 95273
Feel Free to Contact!

More Related Content

PPTX
Machine Learning: A Fast Review
Ahmad Ali Abin
 
PDF
The Future of Trade: Special Gaming Edition
Dubai Multi Commodity Centre
 
PDF
Getting into the tech field. what next
Tessa Mero
 
PDF
The Future Of Work & The Work Of The Future
Arturo Pelayo
 
PDF
Visualising Data with Code
Ri Liu
 
PDF
Time Management & Productivity - Best Practices
Vit Horky
 
PPTX
Top 5 Deep Learning and AI Stories - October 6, 2017
NVIDIA
 
PDF
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Applitools
 
Machine Learning: A Fast Review
Ahmad Ali Abin
 
The Future of Trade: Special Gaming Edition
Dubai Multi Commodity Centre
 
Getting into the tech field. what next
Tessa Mero
 
The Future Of Work & The Work Of The Future
Arturo Pelayo
 
Visualising Data with Code
Ri Liu
 
Time Management & Productivity - Best Practices
Vit Horky
 
Top 5 Deep Learning and AI Stories - October 6, 2017
NVIDIA
 
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Applitools
 

What's hot (20)

PPTX
What is ChatGPT
jeetendra mandal
 
PPTX
Blueprint ChatGPT Lunch & Learn
gnakan
 
PDF
SPEAK with CHATGPT 24h in US Language
Erol GIRAUDY
 
PDF
How AI is going to change the world _M.Mujeeb Riaz.pdf
Mujeeb Riaz
 
PPTX
ppt about chatgpt.pptx
Srinivas237938
 
PDF
ChatGPT and the Future of Work - Clark Boyd
Clark Boyd
 
PPTX
Future of AI - 2023 07 25.pptx
Greg Makowski
 
PDF
List of Generative AI Tools
Data Science Dojo
 
PDF
Everything to know about ChatGPT
Knoldus Inc.
 
PDF
An Introduction to Generative AI
Cori Faklaris
 
PPTX
Unlocking the Power of ChatGPT
Kristine Schachinger SEO and Online Marketing
 
PDF
ChatGPT webinar slides
Alireza Esmikhani
 
PDF
Microsoft + OpenAI: Recent Updates (Machine Learning 15minutes! Broadcast #74)
Naoki (Neo) SATO
 
PDF
Introduction To Artificial Intelligence PowerPoint Presentation Slides
SlideTeam
 
PDF
ChatGPT What It Is and How Writers Can Use It.pdf
Adsy
 
PDF
ChatGPT Evaluation for NLP
XiachongFeng
 
PPTX
ChatGPT Training Session
Kristine Schachinger SEO and Online Marketing
 
PDF
Content In The Age of AI
S&G Content Marketing
 
PDF
Chat GPT Intoduction.pdf
Thiyagu K
 
PDF
What Are the Problems Associated with ChatGPT?
Windzoon Technologies
 
What is ChatGPT
jeetendra mandal
 
Blueprint ChatGPT Lunch & Learn
gnakan
 
SPEAK with CHATGPT 24h in US Language
Erol GIRAUDY
 
How AI is going to change the world _M.Mujeeb Riaz.pdf
Mujeeb Riaz
 
ppt about chatgpt.pptx
Srinivas237938
 
ChatGPT and the Future of Work - Clark Boyd
Clark Boyd
 
Future of AI - 2023 07 25.pptx
Greg Makowski
 
List of Generative AI Tools
Data Science Dojo
 
Everything to know about ChatGPT
Knoldus Inc.
 
An Introduction to Generative AI
Cori Faklaris
 
Unlocking the Power of ChatGPT
Kristine Schachinger SEO and Online Marketing
 
ChatGPT webinar slides
Alireza Esmikhani
 
Microsoft + OpenAI: Recent Updates (Machine Learning 15minutes! Broadcast #74)
Naoki (Neo) SATO
 
Introduction To Artificial Intelligence PowerPoint Presentation Slides
SlideTeam
 
ChatGPT What It Is and How Writers Can Use It.pdf
Adsy
 
ChatGPT Evaluation for NLP
XiachongFeng
 
Content In The Age of AI
S&G Content Marketing
 
Chat GPT Intoduction.pdf
Thiyagu K
 
What Are the Problems Associated with ChatGPT?
Windzoon Technologies
 
Ad

Similar to Introduction to Data Science (20)

PDF
machine_learning_section1_ebook.pdf
agfi
 
PDF
Machine learning
Dr Geetha Mohan
 
PDF
Introduction to machine learning and applications (1)
Manjunath Sindagi
 
PPTX
What is Machine Learning.pptx
kprasad8
 
PPTX
Machine_Learning.pptx
shubhamatak136
 
PDF
Introduction to machine learning
Rahul Sahai
 
PPTX
Eckovation Machine Learning
Shikhar Srivastava
 
PPTX
WEEK 4 - Beginning With Machine Learning_020418.pptx
noblerexford
 
PDF
what-is-machine-learning-and-its-importance-in-todays-world.pdf
Temok IT Services
 
DOCX
Introduction to Machine Learning for btech 7th sem
cse21216
 
PPTX
Machine learning Method and techniques
MarkMojumdar
 
PPTX
Data analytics with python introductory
Abhimanyu Dwivedi
 
PPTX
Machine_Learning_VTU_6th_Semester_Module_1.pptx
MaheshKini3
 
PPTX
Introduction to Machine Learning_ UNIT 1
KiruthikaS78
 
PDF
Week 1.pdf
AnjaliJain608033
 
PPTX
Introduction to Machine Learning
Sujith Jayaprakash
 
PPTX
Machine Learning
Amit Kumar
 
PPTX
Lecture 1.pptxgggggggggggggggggggggggggggggggggggggggggggg
AjayKumar773878
 
PPTX
Introduction To Machine Learning
Knoldus Inc.
 
PPTX
Chapter8_What_Is_Machine_Learning Testing Cases
Ghazanfar Latif (Gabe)
 
machine_learning_section1_ebook.pdf
agfi
 
Machine learning
Dr Geetha Mohan
 
Introduction to machine learning and applications (1)
Manjunath Sindagi
 
What is Machine Learning.pptx
kprasad8
 
Machine_Learning.pptx
shubhamatak136
 
Introduction to machine learning
Rahul Sahai
 
Eckovation Machine Learning
Shikhar Srivastava
 
WEEK 4 - Beginning With Machine Learning_020418.pptx
noblerexford
 
what-is-machine-learning-and-its-importance-in-todays-world.pdf
Temok IT Services
 
Introduction to Machine Learning for btech 7th sem
cse21216
 
Machine learning Method and techniques
MarkMojumdar
 
Data analytics with python introductory
Abhimanyu Dwivedi
 
Machine_Learning_VTU_6th_Semester_Module_1.pptx
MaheshKini3
 
Introduction to Machine Learning_ UNIT 1
KiruthikaS78
 
Week 1.pdf
AnjaliJain608033
 
Introduction to Machine Learning
Sujith Jayaprakash
 
Machine Learning
Amit Kumar
 
Lecture 1.pptxgggggggggggggggggggggggggggggggggggggggggggg
AjayKumar773878
 
Introduction To Machine Learning
Knoldus Inc.
 
Chapter8_What_Is_Machine_Learning Testing Cases
Ghazanfar Latif (Gabe)
 
Ad

Recently uploaded (20)

PPTX
Future_of_AI_Presentation for everyone.pptx
boranamanju07
 
PDF
Technical Writing Module-I Complete Notes.pdf
VedprakashArya13
 
PPTX
MR and reffffffvvvvvvvfversal_083605.pptx
manjeshjain
 
PDF
Classifcation using Machine Learning and deep learning
bhaveshagrawal35
 
PPTX
HSE WEEKLY REPORT for dummies and lazzzzy.pptx
ahmedibrahim691723
 
PPT
Real Life Application of Set theory, Relations and Functions
manavparmar205
 
PDF
D9110.pdfdsfvsdfvsdfvsdfvfvfsvfsvffsdfvsdfvsd
minhn6673
 
PPTX
World-population.pptx fire bunberbpeople
umutunsalnsl4402
 
PPTX
Data-Users-in-Database-Management-Systems (1).pptx
dharmik832021
 
PPTX
Presentation on animal welfare a good topic
kidscream385
 
PDF
Key_Statistical_Techniques_in_Analytics_by_CA_Suvidha_Chaplot.pdf
CA Suvidha Chaplot
 
PDF
717629748-Databricks-Certified-Data-Engineer-Professional-Dumps-by-Ball-21-03...
pedelli41
 
PDF
The_Future_of_Data_Analytics_by_CA_Suvidha_Chaplot_UPDATED.pdf
CA Suvidha Chaplot
 
PPTX
Probability systematic sampling methods.pptx
PrakashRajput19
 
PPTX
Presentation (1) (1).pptx k8hhfftuiiigff
karthikjagath2005
 
PDF
SUMMER INTERNSHIP REPORT[1] (AutoRecovered) (6) (1).pdf
pandeydiksha814
 
PPTX
INFO8116 -Big data architecture and analytics
guddipatel10
 
PPTX
Data Security Breach: Immediate Action Plan
varmabhuvan266
 
PDF
Blue Futuristic Cyber Security Presentation.pdf
tanvikhunt1003
 
PDF
202501214233242351219 QASS Session 2.pdf
lauramejiamillan
 
Future_of_AI_Presentation for everyone.pptx
boranamanju07
 
Technical Writing Module-I Complete Notes.pdf
VedprakashArya13
 
MR and reffffffvvvvvvvfversal_083605.pptx
manjeshjain
 
Classifcation using Machine Learning and deep learning
bhaveshagrawal35
 
HSE WEEKLY REPORT for dummies and lazzzzy.pptx
ahmedibrahim691723
 
Real Life Application of Set theory, Relations and Functions
manavparmar205
 
D9110.pdfdsfvsdfvsdfvsdfvfvfsvfsvffsdfvsdfvsd
minhn6673
 
World-population.pptx fire bunberbpeople
umutunsalnsl4402
 
Data-Users-in-Database-Management-Systems (1).pptx
dharmik832021
 
Presentation on animal welfare a good topic
kidscream385
 
Key_Statistical_Techniques_in_Analytics_by_CA_Suvidha_Chaplot.pdf
CA Suvidha Chaplot
 
717629748-Databricks-Certified-Data-Engineer-Professional-Dumps-by-Ball-21-03...
pedelli41
 
The_Future_of_Data_Analytics_by_CA_Suvidha_Chaplot_UPDATED.pdf
CA Suvidha Chaplot
 
Probability systematic sampling methods.pptx
PrakashRajput19
 
Presentation (1) (1).pptx k8hhfftuiiigff
karthikjagath2005
 
SUMMER INTERNSHIP REPORT[1] (AutoRecovered) (6) (1).pdf
pandeydiksha814
 
INFO8116 -Big data architecture and analytics
guddipatel10
 
Data Security Breach: Immediate Action Plan
varmabhuvan266
 
Blue Futuristic Cyber Security Presentation.pdf
tanvikhunt1003
 
202501214233242351219 QASS Session 2.pdf
lauramejiamillan
 

Introduction to Data Science

  • 5. Challenges deep-dive Why the Hype Around Data Science? ● The demand for data scientists will soar by 28% by 2023 ● Data scientist roles have grown over 650% since 2012, but currently, 35,000 people in the US have data science skills, while hundreds of companies are hiring for those roles. ● Software engineering is a common starting point for professionals who are in the top five fasting growing jobs today. ● Data Science gives you career flexibility
  • 6. Who are Data Scientist?
  • 8. Challenges deep-dive What is Machine Learning ? Machine learning teaches computers to do what comes naturally to humans and animals: learn from experience. Machine learning algorithms use computational methods to “learn” information directly from data without relying on a predetermined equation as a model. The algorithms adaptively improve their performance as the number of samples available for learning increases.
  • 9. Challenges deep-dive A Definition A computer program is said to learn from experience E with respect to some task T and some performance measure P if its performance on T, as measured by P, improves with experience E. -Tom Mitchell
  • 10. Challenges deep-dive A Small Question Suppose we feed a learning algorithm a lot of historical weather data, and have it learn to predict weather. In this setting, what is T,P,E?
  • 13. Challenges deep-dive Machine learning teaches computers to do what comes naturally to humans and animals: learn from experience. Machine learning algorithms use computational methods to “learn” information directly from data without relying on a predetermined equation as a model. The algorithms adaptively improve their performance as the number of samples available for learning increases. Real World Applications With the rise in big data, machine learning has become particularly important for solving problems in areas like these: ● Image processing and computer vision,for face recognition, motion detection, and object detection ● Computational biology, for tumor detection, drug discovery, and DNA sequencing ● Energy production, for price and load forecasting ● Automotive, aerospace, and manufacturing, for predictive maintenance ● Natural language processing
  • 14. Challenges deep-dive How Machine Learning Works Machine learning uses two types of techniques: ● Supervised learning, which trains a model on known input and output data so that it can predict future outputs ● Unsupervised learning, which finds hidden patterns or intrinsic structures in input data.
  • 16. Challenges deep-dive Supervised Learning The aim of supervised machine learning is to build a model that makes predictions based on evidence in the presence of uncertainty. A supervised learning algorithm takes a known set of input data and known responses to the data (output) and trains a model to generate reasonable predictions for the response to new data
  • 17. Classification - predict discrete responses Classification models classify input data into categories.for example, whether an email is genuine or spam, or whether a tumor is cancerous or benign. Regression - predict continuous responses for example, changes in temperature or fluctuations in power demand. Typical applications include electricity load forecasting and algorithmic trading.
  • 18. Challenges deep-dive Unsupervised Learning Unsupervised learning finds hidden patterns or intrinsic structures in data. It is used to draw inferences from dataset consisting of input data without labeled responses.
  • 19. Clustering is the most common unsupervised learning technique. It is used for exploratory data analysis to find hidden patterns or groupings in data.Applications for clustering include gene sequence analysis,market research, and object recognition
  • 20. Knowledge Test Which of the following would you apply supervised learning to? 1. Given genetic (DNA) data from a person, predict the odds of him/her developing diabetes over the next 10 years. 2. Given a large dataset of medical records from patients suffering from heart disease, try to learn whether there might be different clusters of such patients for which we might tailor separate treatments. 3. Given data on how 1000 medical patients respond to an experimental drug (such as effectiveness of the treatment, side effects, etc.), discover whether there are different categories or "types" of patients in terms of how they respond to the drug, and if so what these categories are. 4. Have a computer examine an audio clip of a piece of music, and classify whether or not there are vocals (i.e., a human voice singing) in that audio clip, or if it is a clip of only musical instruments (and no vocals).
  • 21. Knowledge Test Which of the following questions can be answered using a classification algorithm? 1. How does the exchange rate depend on the GDP? 2. Does a document contain the handwritten letter S? 3. How can I group supermarket products using purchase frequency?
  • 22. Knowledge Test 1. Suppose you are working on weather prediction, and you would like to predict whether or not it will be raining at 5pm tomorrow. You want to use a learning algorithm for this.Would you treat this as a classification or a regression problem? 2. Suppose you are working on stock market prediction. You would like to predict whether or not a certain company will declare bankruptcy within the next 7 days (by training on data of similar companies that had previously been at risk of bankruptcy). Would you treat this as a classification or a regression problem?
  • 23. How Do You Decide Which Algorithm to Use?
  • 24. Choosing the right algorithm can seem overwhelming There are dozens of supervised and unsupervised machine learning algorithms, and each takes a different approach to learning.
  • 25. There is no best method or one size fits all. Finding the right algorithm is partly just trial and error But algorithm selection also depends on the size and type of data you’re working with, the insights you want to get from the data, and how those insights will be used.
  • 26. Two - Class Classification
  • 27. Multi - Class Classification
  • 31. Challenges deep-dive When should we use Machine Learning Consider using machine learning when you have a complex task or problem involving a large amount of data and lots of variables, but no existing formula or equation.
  • 33. Knowledge Test Have a look at the statements below and identify the one which is not a machine learning problem 1. Given a viewer's shopping habits, recommend a product to purchase the next time she visits your website. 2. Given the symptoms of a patient, identify her illness. 3. Predict the USD/EUR exchange rate for February 2023. 4. Compute the mean wage of 10 employees for your company.
  • 34. Knowledge Test Which of the following statements uses a machine learning model? 1. Determine whether an incoming email is spam or not 2. Obtain the name of last year's FIFIA Ballon d’Or champion 3. Automatically tagging your new Facebook photos 4. Select the student with the highest grade on a statistics course
  • 36. Challenges deep-dive There is NO Straight Line With machine learning there’s rarely a straight line from start to finish. You’ll find yourself constantly iterating and trying different ideas and approaches
  • 37. Challenges deep-dive Machine learning teaches computers to do what comes naturally to humans and animals: learn from experience. Machine learning algorithms use computational methods to “learn” information directly from data without relying on a predetermined equation as a model. The algorithms adaptively improve their performance as the number of samples available for learning increases. Machine Learning Challenges ● Data comes in all shapes and sizes ● Preprocessing your data might require specialized knowledge and tools ● It takes time to find the best model to fit the data.
  • 38. Challenges deep-dive Machine learning teaches computers to do what comes naturally to humans and animals: learn from experience. Machine learning algorithms use computational methods to “learn” information directly from data without relying on a predetermined equation as a model. The algorithms adaptively improve their performance as the number of samples available for learning increases. Questions to Ask Before Starting Every machine learning workflow begins with three questions: ● What kind of data are you working with? ● What insights do you want to get from it? ● How and where will those insights be applied? Your answers to these questions help you decide whether to use supervised or unsupervised learning.
  • 39. Challenges deep-dive Machine learning teaches computers to do what comes naturally to humans and animals: learn from experience. Machine learning algorithms use computational methods to “learn” information directly from data without relying on a predetermined equation as a model. The algorithms adaptively improve their performance as the number of samples available for learning increases. Data Science - Five Questions There are only five questions that data science answers: ● Is this A or B? ● Is this weird? ● How much – or – How many? ● How is this organized? ● What should I do next?
  • 40. Knowledge Test Which of the following questions can be answered using a classification algorithm? 1. How does the exchange rate depend on the GDP? 2. Does a document contain the handwritten letter S? 3. How can I group supermarket products using purchase frequency?
  • 42. Workflow at a Glance
  • 43. Challenges deep-dive Machine learning teaches computers to do what comes naturally to humans and animals: learn from experience. Machine learning algorithms use computational methods to “learn” information directly from data without relying on a predetermined equation as a model. The algorithms adaptively improve their performance as the number of samples available for learning increases. Step 1 - Load the Data We store the labeled data sets in a text file. A flat file format such as text or CSV is easy to work with and makes it straightforward to import data. Machine learning algorithms aren’t smart enough to tell the difference between noise and valuable information. Before using the data for training, we need to make sure it’s clean and complete
  • 44. Challenges deep-dive Machine learning teaches computers to do what comes naturally to humans and animals: learn from experience. Machine learning algorithms use computational methods to “learn” information directly from data without relying on a predetermined equation as a model. The algorithms adaptively improve their performance as the number of samples available for learning increases. Step 2 - Preprocess the Data To preprocess the data we do the following: ● Look for outliers–data points that lie outside the rest of the data ● Check for missing values ● Divide the data into two sets ○ We save part of the data for testing (the test set) and use the rest (the training set) to build models. This is referred to as holdout, and is a useful cross-validation technique
  • 45. Challenges deep-dive Machine learning teaches computers to do what comes naturally to humans and animals: learn from experience. Machine learning algorithms use computational methods to “learn” information directly from data without relying on a predetermined equation as a model. The algorithms adaptively improve their performance as the number of samples available for learning increases. Step 3 - Derive Features Deriving features (also known as feature engineering or feature extraction) turns raw data into information that a machine learning algorithm can use. Use feature selection to: • Improve the accuracy of a machine learning algorithm • Boost model performance for high-dimensional data sets • Improve model interpretability • Prevent overfitting
  • 46. Challenges deep-dive Machine learning teaches computers to do what comes naturally to humans and animals: learn from experience. Machine learning algorithms use computational methods to “learn” information directly from data without relying on a predetermined equation as a model. The algorithms adaptively improve their performance as the number of samples available for learning increases. Step 4 - Build and Train Model ● The predefined algorithms and the test data are used for building the model. ● The training data is used to train and evaluate the model
  • 47. Challenges deep-dive Machine learning teaches computers to do what comes naturally to humans and animals: learn from experience. Machine learning algorithms use computational methods to “learn” information directly from data without relying on a predetermined equation as a model. The algorithms adaptively improve their performance as the number of samples available for learning increases. Step 5 - Improve the Model Improving a model can take two different directions: make the model simpler or add complexity. Simplify - reduce the number of features Add Complexity - make it more fine-tuned
  • 48. Simplify Popular feature reduction techniques include: ● Correlation matrix – shows the relationship between variables, so that variables (or features) that are not highly correlated can be removed. ● Principal component analysis (PCA) - eliminates redundancy by finding a combination of features that captures key distinctions between the original features and brings out strong patterns in the dataset. ● Sequential feature reduction – reduces features iteratively on the model until there is no improvement in performance
  • 49. Add Complexity ● Use model combination – merge multiple simpler models into a larger model that is better able to represent the trends in the data than any of the simpler models could on their own. ● Add more data sources
  • 50. TO DO ● Getting Started ● Familiarize with Maths and Algorithms ● Select the Infrastructure or Tool ● Create your profile and participate in competition
  • 51. Christy Abraham Joy Email - [email protected] Mob - +91 94000 95273 Feel Free to Contact!