slides-1_intro_day1
slides-1_intro_day1
At a basic level, machine learning is about predicting the future based on the past.
For instance, you might wish to predict how much a user Alice will like a movie that
she has t see , ased o he ati gs of o ies that she has see .
This means making informed guesses about some unobserved property of some
object, based on observed properties of that object.
Alice has just begun taking a course on machine learning. She knows that at the end
of the ou se, she ill e e pe ted to ha e lea ed all about this topic.
A common way of gauging whether or not she has learned is for her teacher, Bob, to
give her a exam. She has done well at learning if she does well on the exam.
But what makes a reasonable exam? If Bob spends the entire semester talking about
machine learning, and then gives Alice an exam o Histo of Potte , the Ali e s
performance on this exam will not be representative of her learning.
On the other hand, if the exam only asks questions that Bob has answered exactly
during lectures, then this is also a ad test of Ali e s lea i g, espe iall if it s a
ope otes e a .
Introduction of machine learning
What is desired is that Alice observes specific examples from the course, and then
has to answer new, but related questions on the exam.
This tests whether Alice has the ability to generalize. Generalization is perhaps the
most central concept in machine learning
Example of machine learning
Machine learning works with structured data to detect patterns that provide insight.
Everyday examples are personalized recommendations from services like Amazon or
Netflix. In the financial arena, machine learning predicts bad loans, finds risky applicants,
and generates credit scores.
Data science
Data science is a "concept to unify statistics, data analysis, machine learning and
their related methods" in order to "understand and analyze actual phenomena" with
data.
Data science employs techniques and theories drawn from many fields within the
broad areas of mathematics, statistics, information science, and computer science,
in particular from the subdomains of machine learning, data mining, databases, and
visualization.
Data science
Statistical inference
Data visualization
Experiment design
Domain knowledge
Communication
Difference between data science, machine learning, and artificial
intelligence
While no one is expecting parity with human intelligence today or in the near future,
AI has big implications in how we live our lives.
The brains behind artificial intelligence is a technology called machine learning, which
is designed to make our jobs easier and more productive.
What are the main challenges with AI technology?
Think of AI as an iceberg. What you see as a user is just the tip — but beneath the
surface lurks a behemoth support system of data scientists and engineers, massive
amounts of data, labor-intensive extraction and preparation of that data, and a huge
technology infrastructure.
It takes a specialized team of data scientists and developers to access the correct
data, prepare the data, build the correct models, and then integrate the predictions
back into an end-user experience such as CRM.
Sales can anticipate next opportunities and exceed customer expectations by knowing
what a customer needs before the customer does
Service can deliver proactive service by anticipating cases and resolving issues before
they become problems
Marketing can create predictive journeys and personalize customer experiences like
never before
IT can embed intelligence everywhere and create smarter apps for employees and
customers
What is deep learning
Deep learning is AI that uses complex algorithms to perform tasks in domains where it
actually learns the domain with little or no human supervision. In essence, the machine
learns how to learn.
Example of deep learning
While the e s lots of e iti g e pe i e tatio happe i g ith deep lea i g, ost
p a ti al appli atio s ou e fa ilia ith a e ased o i age a al sis. With i age
analysis, a computer learns to classify random images by analyzing thousands or millions
of other images and their data points. For example, consumer apps like Google Photos and
Facebook use deep learning to power face recognition in photos.
What is natural language processing (NLP)?
NLP is AI that recognizes language and its many usage and grammar rules by finding
patterns within large datasets.
Example of language processing (NLP)
O e appli atio of NLP that s gai i g t a tio is se ti e t a al sis ithi so ial edia.
Computers use algorithms to look for patterns in user posts across Twitter, Facebook, or
other social networks to understand how customers feel about a specific brand or product.
Data engineers
Data engineers are the designers, builders and managers of the information or "big data"
infrastructure. They develop the architecture that helps analyze and process data in the
way the organization needs it. And they make sure those systems are performing
smoothly. Data science is a team sport.
Role of Data engineers
Big data engineers develop, maintain, test and evaluate big data solutions within
organizations. Most of the time they are also involved in the design of big data
solutions, because of the experience they have with Hadoop based technologies such
as MapReduce, Hive MongoDB or Cassandra.
Data engineers are also responsible for the creation and maintenance of analytics
infrastructure that enables almost every other function in the data world. They are
responsible for the development, construction, maintenance and testing of
architectures, such as databases and large-scale processing systems
Data Scientists vs. Data Engineers
Algorithms in ML
Types of Learning
• Supervised learning
• Goal: Prediction
• Unsupervised learning
• Goal: Discovery
• Reinforcement learning
Supervised Learning
Learn how to predict an output from a
given input.
A B C Prediction
13 N N Y
15 N Y N
16 N N Y
22 N Y N
28 Y N Y
41 N N N
Let’s build a classifier
A B C Prediction
14 N Y ?
15 N N ?
17 Y Y ?
26 N Y ?
30 Y N ?
30 N N ?
Let’s build a classifier
A B C Prediction
14 N Y N
15 N N Y
17 Y Y Y
26 N Y N
30 Y N Y
30 N N N
Let’s build a classifier
What are we predicting?
“Will this consumer like the
new Taylor Swift single?”
Clustering
Semi-supervised Learning
Combines both types of learning
From:
https://www.quora.com/Whats-‐the-‐difference-‐between-‐overfitting-‐and-‐underfitting
Generalization
Restrictions on what a classifier can learn
is called an inductive bias