BigData_V08_CameraReady
BigData_V08_CameraReady
net/publication/308302879
CITATIONS READS
9 4,082
3 authors:
SEE PROFILE
All content following this page was uploaded by Hamidur Rahman on 24 October 2016.
1 Introduction
The ‘Big data’ term has come into the research community more clearly during
2013 and afterwards. Several authors have tried to explain the definition and the
possible issues, technologies, challenges and privacy of big data in a concise way [1-
5]. For example in 2001, Laney et al. have highlighted the challenges and
opportunities generated by increased data through a 3Vs model, i.e., increases in
volume, velocity and variety [6]. In recent years, the world has become so much
digitalized and interconnected and as a result the amount of data has been exploding.
Therefore, to manage the massive amount of records it requires extremely powerful
business intelligence. The problem may arise even more during data acquisition if the
amount of data is too large and then it may have a confusion level that what data to
keep and what to discard and how to store the data in a reliable way. A clear
definition of Big data has been using for the accumulation of different sort of huge
amount of data since last 2-3 years. In 2015, the digital world expanded to 5.6
exabytes (1018 bytes) of data created each day. This figure is expected to double by
every 24 months or so [7]. As a result, storing, managing, sharing, analyzing and
visualizing information via typical database software tools is not only so difficult but
also very hazardous task. Big data can be structured, semi-structured and unstructured
in nature but it could help in businesses by producing automated services to target
their potential partners, agents or customers.
There are some Big data review articles available in online but most of them have
emphasized on specific area e.g., big data framework, big challenges, big data
applications etc. but almost all of them have failed to provide complete overview of
Big data [2, 8, 9]. In this paper, we have presented a complete overview of Big Data
and its present state-of-the-art. Additionally, we have tried to find out big data
important characteristics, Big data frameworks and analytic, challenges of big data
and possible solutions, big data tools and its applications in famous companies. This
article will be very helpful for new researchers specially data scientist, research
institutes and companies to get insights view and latest technologies of big data for
their research planning, business activities and future demand for handling massive
amount of data.
Initialization Implementation
1 http://wiki.apache.org/hadoop/PoweredBy#G
2 https://www.mongodb.com/industries
3 https://datacleaner.org/testimonials
4 http://www.teradata.se/customers-list/browse/?LangType=1053&LangSelect=true
5 https://www.qubole.com/customer/?nabe=5695374637924352:1
6 https://plot.ly/#trusted-by
7 http://www.pentaho.com/customers
8 https://www.python.org/about/success/#engineering
9 https://www.import.io/
Table 1. Big Data Tools used by renowned Companies
No Big data Tools Where it is used
1 Hadoop Google, Amazon, Alibaba, Facebook etc.
2 MongoDB citiGroup, MIT, GOV.UK, ebay, MTV etc.
3 DataCleaner Stratebi, Platon, BestBrains etc.
4 Teradata Air Canada, cisco, Coca-Cola, Coop, Dell, Daimler etc.
5 Qubole Autodesk, Answers.com, Capilary, Quora, Nextdoor, etc.
6 Plot.ly Google, Goji, VTT, U.S. Air Force etc.
7 Pentaho CAT, Nasdaq, Logitech, U.S. Navy etc.
8 Python Forecastwatch.com, AstraZeneca, Carmanah etc.
9 Import.io Quid, Nygg, OpenRise, University of Houston etc.
There are thousands of Big data tools both available in the market to buy and also
for free trial for extraction, storage, cleaning, mining, visualizing, analyzing and
integrating. Table 2 shows the most popular big data tools.
7 Conclusion
A general overview and concept of the Big data has been discussed in this article
including Big data 6V, it’s framework and analytic issues. Additionally, the
difference between big and small data, popular tools, inconsistencies and challenges
also have been reviewed. Due to management and analyzis of petabytes and exabytes
of data, the big data management system cooperates and ensures a high level of data
quality, accessibility and helps to locate valuable information in large set of
unstructured and unplanned data. This review of different techniques can be applied
to various fields of engineering, industry and medical science. Some real life
applications such as autonomous driving, smooth transaction for semi-autonomous
driving or driver monitoring in context of big data analysis will be presented as future
work.
Acknowledgement: The authors would like to acknowledge the Swedish Knowledge Foundation
(KKS), Swedish Governmental agency for innovation Systems (VINNOVA), Volvo Car Corporation, The
Swedish National Road and Transportation Research Institute, Autoliv AB, Hök instrument AB, and Prevas
AB Sweden for their support of the research projects in this area.
10 https://www.import.io/post/all-the-best-big-data-tools-and-how-to-use-them/
References
[1] F. Tekiner and J. A. Keane, "Big Data Framework," in 2013 IEEE International Conference on
Systems, Man, and Cybernetics, 2013, pp. 1494-1499.
[2] S. Sagiroglu and D. Sinanc, "Big data: A review," in Collaboration Technologies and Systems
(CTS), 2013 International Conference on, 2013, pp. 42-47.
[3] A. Katal, M. Wazid, and R. H. Goudar, "Big data: Issues, challenges, tools and Good practices,"
in Contemporary Computing (IC3), 2013 Sixth International Conference on, 2013, pp. 404-409.
[4] W. Xiong, Z. Yu, Z. Bei, J. Zhao, F. Zhang, Y. Zou, et al., "A characterization of big data
benchmarks," in Big Data, 2013 IEEE International Conference on, 2013, pp. 118-125.
[5] T. Lu, X. Guo, B. Xu, L. Zhao, Y. Peng, and H. Yang, "Next Big Thing in Big Data: The
Security of the ICT Supply Chain," in Social Computing (SocialCom), 2013 International
Conference on, 2013, pp. 1066-1073.
[6] D. Laney, "3-D Data Management: Controlling Data Volume, Velocity and Variety," META
Group Original Research Note, 2001.
[7] F. D. Ahmed, A. N. Jaber, M. B. A. Majid, and M. S. Ahmad, "Agent-based Big Data Analytics
in retailing: A case study," in Software Engineering and Computer Systems (ICSECS), 2015 4th
International Conference on, 2015, pp. 67-72.
[8] P. Gupta and N. Tyagi, "An approach towards big data-A review," in Computing,
Communication & Automation (ICCCA), 2015 International Conference on, 2015, pp. 118-123.
[9] T. Rout, M. Garanayak, M. R. Senapati, and S. K. Kamilla, "Big data and its applications: A
review," in Electrical, Electronics, Signals, Communication and Optimization (EESCO), 2015
International Conference on, 2015, pp. 1-5.
[10] H. Mousannif, H. Sabah, Y. Douiji, and Y. O. Sayad, "From Big Data to Big Projects: A Step-
by-Step Roadmap," in Future Internet of Things and Cloud (FiCloud), 2014 International
Conference on, 2014, pp. 373-378.
[11] G. Huang, J. He, C. H. Chi, W. Zhou, and Y. Zhang, "A Data as a Product Model for Future
Consumption of Big Stream Data in Clouds," in Services Computing (SCC), 2015 IEEE
International Conference on, 2015, pp. 256-263.
[12] I. Khan, S. K. Naqvi, M. Alam, and S. N. A. Rizvi, "Data model for Big Data in cloud
environment," in Computing for Sustainable Global Development (INDIACom), 2015 2nd
International Conference on, 2015, pp. 582-585.
[13] G. Suciu, A. Vulpe, R. Craciunescu, C. Butca, and V. Suciu, "Big data fusion for eHealth and
Ambient Assisted Living Cloud Applications," in Communications and Networking
(BlackSeaCom), 2015 IEEE International Black Sea Conference on, 2015, pp. 102-106.
[14] L. T. Yang, L. Kuang, J. Chen, F. Hao, and C. Luo, "A Holistic Approach to Distributed
Dimensionality Reduction of Big Data," IEEE Transactions on Cloud Computing, vol. PP, pp.
1-1, 2015.
[15] Y. Zheng, "Methodologies for Cross-Domain Data Fusion: An Overview," IEEE Transactions
on Big Data, vol. 1, pp. 16-34, 2015.
[16] S. Pandey and V. Tokekar, "Prominence of MapReduce in Big Data Processing," in
Communication Systems and Network Technologies (CSNT), 2014 Fourth International
Conference on, 2014, pp. 555-560.
[17] J. Wang, Z. Song, Q. Li, J. Yu, and F. Chen, "Semantic-based intelligent data clean framework
for big data," in Security, Pattern Analysis, and Cybernetics (SPAC), 2014 International
Conference on, 2014, pp. 448-453.
[18] S. Biookaghazadeh, Y. Xu, S. Zhou, and M. Zhao, "Enabling scientific data storage and
processing on big-data systems," in Big Data (Big Data), 2015 IEEE International Conference
on, 2015, pp. 1978-1984.
[19] Y. Diao, K. Y. Liu, X. Meng, X. Ye, and K. He, "A Big Data Online Cleaning Algorithm Based
on Dynamic Outlier Detection," in Cyber-Enabled Distributed Computing and Knowledge
Discovery (CyberC), 2015 International Conference on, 2015, pp. 230-234.
[20] I. Taleb, R. Dssouli, and M. A. Serhani, "Big Data Pre-processing: A Quality Framework," in
2015 IEEE International Congress on Big Data, 2015, pp. 191-198.
[21] D. Zhang, "Inconsistencies in big data," in Cognitive Informatics & Cognitive Computing
(ICCI*CC), 2013 12th IEEE International Conference on, 2013, pp. 61-67.