0% found this document useful (0 votes)
56 views

Data Science Presentation

Data science is defined as an interdisciplinary field that uses scientific methods and processes to extract knowledge and insights from data in various forms, both structured and unstructured. The goal of data science is to analyze and understand real-world phenomena by revealing hidden patterns in data. It is a multidisciplinary field that combines techniques from mathematics, statistics, information technology, communication, and domains like business management, natural science, social science, and engineering to extract value from data.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
56 views

Data Science Presentation

Data science is defined as an interdisciplinary field that uses scientific methods and processes to extract knowledge and insights from data in various forms, both structured and unstructured. The goal of data science is to analyze and understand real-world phenomena by revealing hidden patterns in data. It is a multidisciplinary field that combines techniques from mathematics, statistics, information technology, communication, and domains like business management, natural science, social science, and engineering to extract value from data.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 22

WHAT IS DATA SCIENCE?

• “Data science, also known as data-driven science, is an


interdisciplinary field of scientific methods, processes,
algorithms and systems to extract knowledge or insights
from data in various forms, either structured or
unstructured, similar to data mining.”

Canadian Data Science Workshop 5


WHAT IS DATA SCIENCE?
• “Data science, also known as data-driven science, is an
interdisciplinary field of scientific methods, processes,
algorithms and systems to extract knowledge or insights
from data in various forms, either structured or
unstructured, similar to data mining.”
• “Data science intends to analyze and understand actual
phenomena with ‘data’. In other words, the aim of data science
is to reveal the features or the hidden structure of complicated
natural, human, and social phenomena with data from a
different point of view from the established or traditional theory
and method.”

Canadian Data Science Workshop 5


WHAT IS DATA SCIENCE?
• Fourth paradigm
• “… change of all sciences moving from observational, to
theoretical, to computational and now to the 4th Paradigm –
Data-Intensive Scientific Discovery”

Canadian Data Science Workshop 6


WHAT IS IMPORTANT?

Need to solve a real problem using data…


No applications, no data science.

Canadian Data Science Workshop 7


DATA SCIENCE AS A UNIFIER

Humanities
Data Machine/
Management Statistical
Learning

Law
Data Application
Domain
Science Expertise

Social
Visualization
Science
Mathematical
Optimization
Canadian Data Science Workshop 8
DATA SCIENCE AND BIG DATA
• They are not the “same thing”
• Big data = crude oil
• Big data is about extracting “crude oil”, transporting it in “mega tankers”,
siphoning it through “pipelines”, and storing it in “massive silos”
• Data science is about refining the “crude oil”

Carlos Samohano
Founder, Data Science London

Canadian Data Science Workshop 9


DATA SCIENCE AND ARTIFICIAL INTELLIGENCE

Data ML/DM/
Analytics
Artificial
Science Intelligence

Canadian Data Science Workshop 10


DATA SCIENCE AND ARTIFICIAL INTELLIGENCE

Data ML/DM/
Analytics
Artificial
Science Intelligence

“Data science produces insights.


Canadian Data Science Workshop Machine learning produces predictions” 10
DATA SCIENCE APPLICATION EXAMPLES
• Fraud detection
• Investigate fraud patterns in past data
• Early detection is important
• Before damage propagates
• Harder than late detection
• Precision is important
• False positive and false negative are both
bad
• Real-time analytics

Canadian Data Science Workshop 11


DATA SCIENCE APPLICATION EXAMPLES
• Recommender systems
• The ability to offer unique
personalized service
• Increase sales, click-through rates,
conversions, …
• Netflix recommender system valued at
$1B per year
• Amazon recommender system drives a
20-35% lift in sales annually
• Collaborative filtering at scale

Canadian Data Science Workshop 12


DATA SCIENCE APPLICATION EXAMPLES
• Predicting why patients are being
readmitted
• Reduce costs
• Improve population health
• Find the “why” behind specific
populations being readmitted
• Data lakes of multiple data sources
• Investigate ties between readmission and
socioeconomic data points, patient
history, genetics, …

Canadian Data Science Workshop 13


DATA SCIENCE APPLICATION EXAMPLES
• “Smart cities”
• Not well-defined

Canadian Data Science Workshop 14


DATA SCIENCE APPLICATION EXAMPLES
• “Smart cities”
• Not well-defined

Canadian Data Science Workshop 14


DATA SCIENCE APPLICATION EXAMPLES
• “Smart cities”
• Not well-defined
• Generally refers to using data and
ICT to
• Better plan communities
• Better manage assets
• Reduce costs
• Deploy open data to better engage
with community

Canadian Data Science Workshop 14


DATA SCIENCE APPLICATION EXAMPLES
• Moneyball
• How to build a baseball team on a very
low budget by relying on data
• Sabermetrics: the statistical analysis of
baseball data to objectively evaluate
performance
• 2002 record of 103-59 was joint best in
MLB
• Team salary budget: $40 million
• Other team: Yankees
• Team salary budget: $120 million

Canadian Data Science Workshop 15


HOLISTIC APPROACH TO DATA SCIENCE

Core

Data Security & Privacy

Data Making Data Data


Management of Modeling & Dissemination &
Trustable & Big Data Analysis Visualization
Acquisition Usable Preservation

Ethics, Policy & Social Impact

Application Application Application Application

Canadian Data Science Workshop 16


CORE RESEARCH ISSUES & INTERACTIONS
Making Data
Trustable &
Usable

Big Data Modelling &


Management Analysis

Data
Visualization &
Dissemination
Canadian Data Science Workshop 17
CORE RESEARCH ISSUES & INTERACTIONS
• Data cleaning
Making Data • Sampling
Trustable & • Data provenance
Usable

Big Data Modelling &


Management Analysis

Data
Visualization &
Dissemination
Canadian Data Science Workshop 17
CORE RESEARCH ISSUES & INTERACTIONS
• Data cleaning
Making Data • Sampling
• Data lakes Trustable & • Data provenance
• Batch & online access Usable
• Platforms

Big Data Modelling &


Management Analysis

Data
Visualization &
Dissemination
Canadian Data Science Workshop 17
CORE RESEARCH ISSUES & INTERACTIONS
• Data cleaning
Making Data • Sampling
• Data lakes Trustable & • Data provenance
• Batch & online access Usable
• Platforms

Big Data Modelling &


Management Analysis

• Models & methods for data


lakes
• Unsupervised
Data
Visualization & classification & AI
Dissemination
Canadian Data Science Workshop 17
CORE RESEARCH ISSUES & INTERACTIONS
• Data cleaning
Making Data • Sampling
• Data lakes Trustable & • Data provenance
• Batch & online access Usable
• Platforms

Big Data Modelling &


Management Analysis

• Visualization for wider • Models & methods for data


audience
lakes
• Visualization for data
exploration • Unsupervised
Data
• Open data technologies Visualization & classification & AI
Dissemination
Canadian Data Science Workshop 17
CORE RESEARCH ISSUES & INTERACTIONS
• Data cleaning
Making Data • Sampling
• Data lakes Trustable & • Data provenance
• Batch & online access Usable
• Platforms • DM support for
provenance
• Data preparation for big
data management
Big Data• Cleaning for data Modelling &
Managementanalysis Analysis
• DM for ML
• Visualization for wider • ML for DM
• Visual analytics • Models & methods for data
audience
lakes
• Visualization for data …
exploration • Unsupervised
Data
• Open data technologies Visualization & classification & AI
Dissemination
Canadian Data Science Workshop 17

You might also like