0% found this document useful (0 votes)
69 views

Large Scale Machine Learning

This document discusses techniques for machine learning with large datasets, including: 1. Stochastic gradient descent, which processes small mini-batches of examples in parallel to reduce training time compared to batch gradient descent. 2. Online learning algorithms which update models incrementally as new data becomes available, such as for product search or recommendation systems. 3. Map-reduce and data parallelism techniques which distribute computation across multiple machines or processor cores to further reduce training time for large datasets.
Copyright
© © All Rights Reserved
0% found this document useful (0 votes)
69 views

Large Scale Machine Learning

This document discusses techniques for machine learning with large datasets, including: 1. Stochastic gradient descent, which processes small mini-batches of examples in parallel to reduce training time compared to batch gradient descent. 2. Online learning algorithms which update models incrementally as new data becomes available, such as for product search or recommendation systems. 3. Map-reduce and data parallelism techniques which distribute computation across multiple machines or processor cores to further reduce training time for large datasets.
Copyright
© © All Rights Reserved
You are on page 1/ 24

Large scale

machine learning
Learning with
large datasets
Machine Learning
Machine learning and data
Classify between confusable words.
E.g., {to, two, too}, {then, than}.

Accuracy
For breakfast I ate _____ eggs.

Training set size (millions)

“It’s not who has the best algorithm that wins.


It’s who has the most data.”
[Figure from Banko and Brill, 2001] Andrew Ng
Learning with large datasets
error

error
(training set size) (training set size) Andrew Ng
Large scale
machine learning
Stochastic
gradient descent
Machine Learning
Linear regression with gradient descent

Repeat {

(for every )
}

Andrew Ng
Linear regression with gradient descent

Repeat {

(for every )
}

Andrew Ng
Batch gradient descent Stochastic gradient descent

Repeat {

(for every )
}

Andrew Ng
Stochastic gradient descent
1. Randomly shuffle (reorder)
training examples

2. Repeat {
for {

(for every )
}
}

Andrew Ng
Large scale
machine learning
Mini-batch
gradient descent
Machine Learning
Mini-batch gradient descent
Batch gradient descent: Use all examples in each iteration
Stochastic gradient descent: Use 1 example in each iteration
Mini-batch gradient descent: Use examples in each iteration

Andrew Ng
Mini-batch gradient descent
Say .
Repeat {
for {

(for every )
}
}

Andrew Ng
Large scale
machine learning
Stochastic
gradient descent
convergence
Machine Learning
Checking for convergence
Batch gradient descent:
Plot as a function of the number of iterations of
gradient descent.

Stochastic gradient descent:

During learning, compute before updating


using .
Every 1000 iterations (say), plot averaged
over the last 1000 examples processed by algorithm.
Andrew Ng
Checking for convergence
Plot , averaged over the last 1000 (say) examples

No. of iterations No. of iterations

No. of iterations No. of iterations


Andrew Ng
Stochastic gradient descent

1. Randomly shuffle dataset.


2. Repeat {
for {

(for )
}
}
Learning rate is typically held constant. Can slowly decrease
const1
over time if we want to converge. (E.g. iterationNumber + const2 )
Andrew Ng
Stochastic gradient descent

1. Randomly shuffle dataset.


2. Repeat {
for {

(for )
}
}
Learning rate is typically held constant. Can slowly decrease
const1
over time if we want to converge. (E.g. iterationNumber + const2 )
Andrew Ng
Large scale
machine learning
Online learning

Machine Learning
Online learning
Shipping service website where user comes, specifies origin and
destination, you offer to ship their package for some asking price,
and users sometimes choose to use your shipping service ( ),
sometimes not ( ).
Features capture properties of user, of origin/destination and
asking price. We want to learn to optimize price.

Andrew Ng
Other online learning example:
Product search (learning to search)
User searches for “Android phone 1080p camera”
Have 100 phones in store. Will return 10 results.
features of phone, how many words in user query match
name of phone, how many words in query match description
of phone, etc.
if user clicks on link. otherwise.
Learn .
Use to show user the 10 phones they’re most likely to click on.
Other examples: Choosing special offers to show user; customized
selection of news articles; product recommendation; …
Andrew Ng
Large scale
machine learning
Map-reduce and
data parallelism
Machine Learning
Map-reduce
Batch gradient descent:

Machine 1: Use

Machine 2: Use

Machine 3: Use

Machine 4: Use

[ Jeffrey Dean and Sanjay Ghemawat] Andrew Ng


Map-reduce

Computer 1

Training set
Computer 2 Combine results

Computer 3

[http://openclipart.org/detail/17924/computer-by-aj]
Computer 4
Andrew Ng
Map-reduce and summation over the training set
Many learning algorithms can be expressed as computing sums of
functions over the training set.

E.g. for advanced optimization, with logistic regression, need:

Andrew Ng
Multi-core machines

Core 1

Training set
Core 2 Combine results

Core 3

[http://openclipart.org/detail/100267/cpu- Core 4
(central-processing-unit)-by-ivak-100267] Andrew Ng

You might also like