0% found this document useful (0 votes)
86 views

An Introduction To Big Data

Uploaded by

Dewi Ardiani
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
86 views

An Introduction To Big Data

Uploaded by

Dewi Ardiani
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 31

Big Data

An Introduction to Big Data


BY
D ES A K P U T U EK A N I L A K U S M AWAT I ,
W I D YA D I S E TYAWA N ,
D U M A N C A RE K RI S N A

UDAYANA UNIVERSITY
What we will cover in this presentation?

• What is Big Data?

• How is it different from “small data”?

• How will it impact our lives?

• Is it a good thing?

• How can librarians prepare?


The Information Continuum

Cartoon by David Somerville, based on a two pane version by Hugh McLeod


The Scientific Method

© ArchonMagnus
Traditional Research
1. Generate a hypothesis.
2. Assemble a sample
population and a control
group.
3. Expose both to an
intervention (drug,
treatment, etc.).
4. Do statistical analysis to
identify causal relationships.
5. Rinse and repeat… ©Mark A. Hicks
Types of Data

Quantitative Data Qualitative Data


• Measurable • Descriptive
• Collected through • Collected through
measuring things that observation, field work,
have a fixed reality focus groups, interviews,
• Close ended recording or filming
conversations
• Open ended
Big Data

Data that is too large or too


complex to be managed
using traditional data
processing, analysis, and
storage techniques.
Volume Variety
The amount The types
of data of data

The 4 V’s
of
Big Data

Velocity Veracity
The frequency The quality
of data of data
Volume: scale of data
Volume: scale of data

• 90% of today’s data has been created in just the last 2 years

• Every day we create 2.5 quintillion bytes of data or enough to fill 10 million
Blu-ray discs
• 40 zettabytes (40 trillion gigabytes) of data will be created by 2020, an increase
of 300 times from 2005, and the equivalent of 5,200 gigabytes of data for every
man, woman and child on Earth
• Most companies in the US have over 100 terabytes (100,000 gigabytes) of data
stored
Variety: different forms of data
What happens in an Internet minute?
Velocity: analysis of streaming data
Veracity: trustworthiness of data

• Origin
• Authenticity
• Trustworthiness
• Completeness
• Integrity
Volume Value Variety
The amount The types
of data of data
The 4 V’s
of
Big Data

Velocity Veracity
The frequency The quality
of data of data
Big Data and Research
Big Data Mining
1. Collect Big Data or obtain access
to a repository.

2. Perform data analysis to explore


patterns (pattern recognition,
predictive analytics).

3. Identify potential correlations.

4. Good enough!
Big Data in Health Care

• Faster and cheaper technology and data storage

• Widespread sensing devices

• An increase in “born” digital data

• Greater availability of data via repositories

• Data sharing mandates


Faster and
cheaper
technology and
data storage

The cost to sequence a whole human


genome sequence has fallen from +
$100 million to less than $1,000
over the past 15 years.
Sensing devices
• Smartwatches
• Smart jewelry
• Fitness trackers
• Sport watches
• Smart glasses
• Smart clothing…
An increase in
“born”
digital data
Data that originates as digital
data, rather than being
converted or digitized later is
proliferating. Think digital
electronic medical records,
implanted medical devices,
diagnostic imaging
technology…
Greater
availability of
data via
repositories

As of April 2016 the Registry of


Research Data Repositories
(re3data.org) listed 1,500
research data repositories.
Currently 458 are key worded
“medicine.”
Sharing
mandates

The number of funders and


journals with data sharing
policies has grown significantly
in the past decade…
The Health Care Big Data Horizon

• Leverage the Electronic Health Record to improve


diagnosis, outcomes, and reduce costs
• Integrate patient-generated health data and the Internet of
Things (IoT)
• Incorporate environmental and socioeconomic data in
patient diagnosis and treatment
• Develop personalized care specific to each patient’s
particular needs (Precision Medicine)
Health Disparities: Big Data to the Rescue?
“Big Data” on PubMed
1400
1196
1200
Instances of “Big Data”

1000
800 723

600
463
400
201
200
2 1 9 3 2 7 41
0
2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016
Year
Hurdles and Risks
• Unstructured Data (~75% of data in the healthcare environment)
• Data privacy/security (HIPAA Compliance, Patient
Confidentiality, Personally Identifiable Information/PII)
• Inconsistent, incomplete , unavailable, poor quality or invalid
data
• Poor analysis/analytics leading to erroneous
correlations/conclusions
• Misused data
Big Data and Librarians

What role will librarians play


in the Big Data revolution?
Do you see yourself playing
a part?
How will you prepare
yourself?
What resources will you use?
Patricia Brennan, RN, PhD, NNLM Director
Resources…
• DataMed https://datamed.org/
• Institute for Health Metrics and Evaluation’s Global Health Data Exchange
http://ghdx.healthdata.org/
• NNLM RD3: Resources for Data-Driven Discovery https://nnlm.gov/data/
• NNLM’s YouTube Channel https://www.youtube.com/channel/
UCmZqoegBFKJQF69V8d-05Bw
• OHSU’s Big Data to Knowledge https://dmice.ohsu.edu/bd2k/topics.html
• Registry of Research Data Repositories (re3data.org) http://www.re3data.org/
• NIH’s All of Us Program https://allofus.nih.gov/
References
• Borgman, Christine L. Big data, little data, no data: Scholarship in the
networked world. MIT Press, 2015.
• Federer, Lisa. Beyond the SEA: Data Science 101: An introduction for
librarians https://www.youtube.com/watch?v=i78ciP1eGxo&t=3s
• Mayer-Schönberger, Viktor, and Kenneth Cukier. Big data: A
revolution that will transform how we live, work and think. Houghton
Mifflin Harcourt, 2013.
Thank You

You might also like