Big Data Analytics Course Introduction
Big Data Analytics Course Introduction
material useful in giving your own lectures. Feel free to use these slides verbatim, or to modify
them to fit your own needs. If you make use of a significant portion of these slides in your own
lecture, please include this message, or a link to our web site: http://www.mmds.org
Predictive methods
Use some variables to predict unknown
or future values of other variables
Example: Recommender systems
Usage
Quality
Context
Streaming
Scalability
Collect
Prepare ct s
Si a
e
Ne red
ks
als
Data
i
St ogi
t
ed
or
M Tex
Represent
gn
u
tim
ol
tw
Modalities
nt
ru
Model
ul
O
Reason
Visualize
Data Operators
J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org 11
Data Mining: Cultures
Data mining overlaps with:
Databases: Large-scale data, simple queries
Machine learning: Small data, Complex models
CS Theory: (Randomized) Algorithms
Different cultures:
To a DB person, data mining is an extreme form of
analytic processing – queries that
CS
examine large amounts of data Theory
Machine
Learning
Result is the query answer
Data
To a ML person, data-mining Mining
is the inference of models
Result is the parameters of the model Database
systems
In this class we will do both!
J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org 12
This Class: CS246
This class overlaps with machine learning,
statistics, artificial intelligence, databases but
more stress on
Scalability (big data)
Algorithms Statistics Machine
Learning
Computing architectures
Automation for handling Data Mining
large data
Database
systems
Dimension Duplicate
Spam Queries on Perceptron,
ality document
reduction Detection streams kNN detection
Office hours:
Jure: Wednesdays 9-10am, Gates 418
See course website for TA office hours
For SCPD students we will use Google Hangout
We will post Google Hangout links on Piazza
J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org 19
Course Logistics
Course website:
http://cs246.stanford.edu
Lecture slides (at least 30min before the lecture)
Homeworks, solutions
Readings
Readings: Book Mining of Massive Datasets
with A. Rajaraman and J. Ullman
Free online:
http://www.mmds.org