An Introduction To Big Data
An Introduction To Big Data
UDAYANA UNIVERSITY
What we will cover in this presentation?
• Is it a good thing?
© ArchonMagnus
Traditional Research
1. Generate a hypothesis.
2. Assemble a sample
population and a control
group.
3. Expose both to an
intervention (drug,
treatment, etc.).
4. Do statistical analysis to
identify causal relationships.
5. Rinse and repeat… ©Mark A. Hicks
Types of Data
The 4 V’s
of
Big Data
Velocity Veracity
The frequency The quality
of data of data
Volume: scale of data
Volume: scale of data
• 90% of today’s data has been created in just the last 2 years
• Every day we create 2.5 quintillion bytes of data or enough to fill 10 million
Blu-ray discs
• 40 zettabytes (40 trillion gigabytes) of data will be created by 2020, an increase
of 300 times from 2005, and the equivalent of 5,200 gigabytes of data for every
man, woman and child on Earth
• Most companies in the US have over 100 terabytes (100,000 gigabytes) of data
stored
Variety: different forms of data
What happens in an Internet minute?
Velocity: analysis of streaming data
Veracity: trustworthiness of data
• Origin
• Authenticity
• Trustworthiness
• Completeness
• Integrity
Volume Value Variety
The amount The types
of data of data
The 4 V’s
of
Big Data
Velocity Veracity
The frequency The quality
of data of data
Big Data and Research
Big Data Mining
1. Collect Big Data or obtain access
to a repository.
4. Good enough!
Big Data in Health Care
1000
800 723
600
463
400
201
200
2 1 9 3 2 7 41
0
2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016
Year
Hurdles and Risks
• Unstructured Data (~75% of data in the healthcare environment)
• Data privacy/security (HIPAA Compliance, Patient
Confidentiality, Personally Identifiable Information/PII)
• Inconsistent, incomplete , unavailable, poor quality or invalid
data
• Poor analysis/analytics leading to erroneous
correlations/conclusions
• Misused data
Big Data and Librarians