Intro to DS
Intro to DS
Science
Data Vs Information Vs Knowledge
● Quality monitoring
○ TDS alarm, Low cartridge alarm, monitoring a complex system with
number of parameters
● Stop guessing
○ “I think this would work” – no more trial, go with data.
● Effective resource utilization
○ Data helps to decide how one can utilize critical resource more
effectively
Why Data is important?
● Add-on menu in hostel
○ Requirement may varies depending on menu, day, month, festival,
vacation
● Example of OLA/UBER/OYO
● New models are estimating which cities are most at risk for
spread of the Ebola virus.
… Health/Scientific
Internet of Things / M2M Computing
Datafication
● How to quantify friendship?
● How to rate a product?
● Taking all aspects of life and turning them
into data
○ Google’s augmented-reality glasses datafy the gaze
○ Linked in datafy our professional network
● When we like something or someone online
then we are helping in datafying something.
How Big the data is
● There are 2.5 Exabyte (1 Exabyte = 1018 byte) of data created each day
● Internet
○ More than 3.7 billion humans use the internet
○ On average, Google now processes more than 3.5 billion searches per day
● Digital Photo
○ People takes around 1.2 trillion photos per day
Data generated in a Day
The Data Equation
Oceans of Data
Drops of
Understanding
(Nix 1984)
What is Data Science?
Like any emerging field, it isn’t yet well defined,
but incorporates elements of:
● Exploratory Data Analysis and Visualization
● Machine Learning and Statistics
● High-Performance Computing technologies
for dealing with scale.
What is Data Science?
● Data science is an interdisciplinary field that uses
scientific methods, processes, algorithms and systems to
extract knowledge and insights from data in various forms.