Cloud-Big Data Paper
Cloud-Big Data Paper
Abstract
Big Data applications are pervading more and more aspects of our life, encompassing commercial
and scientific uses at increasing rates as we move towards exascale analytics. Examples of Big Data
applications include storing and accessing user data in commercial clouds, mining of social data,
and analysis of large-scale simulations and experiments such as the Large Hadron Collider. An in-
creasing number of such data—intensive applications and services are relying on clouds in order
to process and manage the enormous amounts of data required for continuous operation. It can be
difficult to decide which of the many options for cloud processing is suitable for a given applica-
tion; the aim of this paper is therefore to provide an interested user with an overview of the most
important concepts of cloud computing as it relates to processing of Big Data.
Keywords
Big Data, Cloud Computing, Cloud Storage, Software as a Service, NoSQL, Architectures
1. Introduction
Attempting to define cloud computing can be as nebulous an activity as the term itself implies. However, ac-
cording to the National Institute of Standards and Technology (NIST), the definition of cloud computing is “a
model for enabling ubiquitous, convenient, on-demand network access to a shared pool of configurable com-
puting resources (e.g., networks, servers, storage, applications, and services) that can be rapidly provisioned
and released with minimal management effort or service provider interaction” [1]. Other definitions have been
put forward, but the above is one of the most accepted and best enumerated. One of the many reasons for the
How to cite this paper: Branch, R., Tjeerdsma, H., Wilson, C., Hurley, R. and McConnell, S. (2014) Cloud Computing and Big
Data: A Review of Current Service Models and Hardware Perspectives. Journal of Software Engineering and Applications, 7,
686-693. http://dx.doi.org/10.4236/jsea.2014.78063
R. Branch et al.
ambiguity of the term and its use is the complicated interplay of service models, architectures, storage, and
software deployed in various cloud applications today. In this paper, we review the most common approaches
and aspects in an attempt to provide researchers a tool to guide them in the selection process when considering
cloud applications for processing of Big Data.
687
R. Branch et al.
of the two [1]. Physically it may exist at the site of the organization or elsewhere [1].
A Public Cloud is used by the general public [1] and is owned, operated, and provided by a business, aca-
demic institute, government organization, or some combination of the three [2]. In general, the Public Cloud will
be housed at the physical location of the provider [2]. The resources of the Public Cloud are offered to cloud us-
ers as a service [3]. Currently Public Clouds are the most dominant deployment model in use [4].
A Community Cloud is used by some specific group or community of users from a combination of different
organizations which share some common goal or concern [1]. Goals tend to be related to security, compliance,
or some specific mission [1]. This cloud may be managed, operated, and constructed by a group, single organi-
zation, third party, or some combination of the three [2]. The cloud may be physically located at the site of a
single organization, spread across a group of organizations, located at a third party, or some combination of the
three [2].
A Hybrid Cloud is a combination of two or more of the above cloud deployment models [1]. Each cloud used
in the combination are unique and independent clouds which are integrated in such a way as to allow portability
between the unique clouds [1] [3]. Combinations of the different clouds allow cloud users to create a new cloud
which can add additional benefits for the cloud user such as creating new services.
688
R. Branch et al.
tunities for side channel attacks [4]. There are also costs associated with transferring data and communication
with the cloud is increased [4]. Furthermore, the wide range of APIs which are in use by multiple clouds trans-
lates to a significant amount of time devoted to learning how to interface with these clouds [4].
689
R. Branch et al.
computers. This is much more cost efficient than the high end servers which relational databases typically re-
quire.
However, it is unrealistic to assume that NoSQL data stores are a replacement for traditional relational data-
bases such as SQL. As a result of NoSQL’s open source nature and the variety of implementations that exist, not
many reliable standards are available, causing portability to suffer. Performance and scalability are often put
ahead of consistency and, as a result, NoSQL is often an unacceptable solution when working with data where
consistency is critical, such as records of financial transactions. NoSQL data stores may be easier to manage, but
due to a lack of maturity with this new technology, there are a limited number of administrators and developers
with the knowledge and skills required. Finally, NoSQL data stores can be difficult to use for analyzing and re-
porting due to the nature of their non structured implementations [15].
690
R. Branch et al.
ing industry, which demanded better rendering capabilities and higher resolutions. The acceleration of GPUs
also brought about an interest in their use in computational problems. This led to the introduction of the Ge
Force 8800 graphics card designed by NVIDIA in 2006, which allowed not only for the card to be used for
graphics processing and game play but also for computing applications as well [22].
The current cloud computing scene is heterogeneous with respect to hardware, which, when combined with
lessons learned from issues in parallel computing over the last five decades, implies that scalability, ease of use,
portability, and efficiency will suffer.
5. Summary
In this paper we reviewed the most common approaches and aspects of Cloud Computing in an attempt to pro-
vide researchers a tool to guide them in the selection process when considering cloud applications for processing
of Big Data. We did this by viewing Cloud Computing through the lens of the user, data, and hardware perspec-
tive.
From the user perspective we have given a brief overview of the current service and deployment models, and
what distinguishes them from each other. This provides a guideline for researchers in order to assist them in
making decisions regarding what might be a best fit for their goals. Highlighting the practical issues with inter-
691
R. Branch et al.
operability, moving data to the cloud, and portability show opportunities which exist for future development of
cloud computing and point to areas of concern that researchers need to address when making decisions about
cloud computing. Particularly, security is of paramount importance to researchers who wish to keep their sensi-
tive data private.
From the data perspective we see that traditional relational database schemes are being replaced by more un-
structured methods of storing and accessing data through the use of NoSQL and its many and varied implemen-
tations. Additionally, storage mechanisms implemented via flash arrays are being embraced by cloud providers,
implemented in ways which are transparent to the cloud user. Organizations are faced with the choice of main-
taining their own expensive storage devices or utilizing the cloud for their storage needs.
Finally, from the hardware perspective we see an increasing interest in the use of distributed systems and
GPUs as processing units for cloud providers, while disadvantages become apparent when network configura-
tion is brought into the mix, creating bottlenecks associated with communication.
References
[1] NIST (2011) The NIST Definition of Cloud Computing.
http://csrc.nist.gov/publications/nistpubs/800-145/SP800-145.pdf
[2] NIST (2011) NIST Cloud Computing Reference Architecture.
http://www.nist.gov/customcf/get_pdf.cfm?pub_id=909505
[3] Zhang, Q., Cheng, L. and Boutaba, R. (2010) Cloud Computing: State-Of-The-Art and Research Challenges. Journal
of Internet Services and Applications, 1.1, 7-18.
[4] Dillon, T., Wu, C. and Chang, E. (2010) Cloud Computing: Issues and Challenges. Proceedings of the 24th IEEE In-
ternational Conference on Advanced Information Networking and Applications (AINA), Perth, 20-23 April 2010, 27-33.
http://dx.doi.org/10.1109/AINA.2010.187
[5] Google (2013) Google App Engine. https://developers.google.com/appengine/
[6] Microsoft (2013) Microsoft Azure. http://www.azure.microsoft.com/en-us/
[7] Amazon (2013) Amazon EC2. http://aws.amazon.com/ec2/
[8] Ambrust, M., Fox, A., Griffith, R., Joseph, A.D., Katz, R., Konwinski A., Lee, G., Patterson, D., Rabkin, A., Stoica, I.
and Zaharia, M. (2010) A View of Cloud Computing. Communications of the ACM, 53.4, 50-58.
http://dx.doi.org/10.1145/1721654.1721672
[9] Feldman, M. (2013) The Big Data Challenge: Intelligent Tiered Storage at Scale.
http://www.cray.com/Assets/PDF/Integrated_Tiered_Storage_Whitepaper.pdf
[10] Rouse, M. (2008) Flash Storage. http://whatis.techtarget.com/definition/flash-storage
[11] Lawson, S. (2014) IBM Updates All-Flash Storage Array to Complement X6 Servers.
http://www.infoworld.com/t/solid-state-drives/ibm-updates-all-flash-storage-array-complement-x6-servers-234432
[12] Amazon (2010) AWS Import/Export. http://docs.aws.amazon.com/AWSImportExport/latest/DG/whatisIE.html
[13] Jansen, W. (2011) Guidelines on Security and Privacy in Public Cloud Computing. National Institute of Standards and
Technology, U.S. Department of Commerce, Computer Security Division, Gaithersburg.
[14] Williams, P. (2012) The NoSQL Movement—What Is It?
http://www.dataversity.net/the-nosql-movement-what-is-it/
[15] Greene, N. (2013) The Five Key Advantages (And Disadvantages) of NoSQL.
http://greendatacenterconference.com/blog/the-five-key-advantages-and-disadvantages-of-nosql/
[16] Williams, P. (2012) The NoSQL Movement: Key-Value Databases.
http://www.dataversity.net/the-nosql-movement-key-value-databases/
[17] Loshin, D. (2013) An Introduction to NoSQL Data Management for Big Data.
http://data-informed.com/introduction-nosql-data-management-big-data/#sthash.NuRvqbd4.dpuf
http://data-informed.com/introduction-nosql-data-management-big-data
[18] Williams, P. (2012) The NoSQL Movement—Graph Databases.
http://www.dataversity.net/the-nosql-movement-graph-databases/
[19] Williams, P. (2012) The NoSQL Movement: Document Databases.
http://www.dataversity.net/the-nosql-movement-document-databases/
[20] Williams, P. (2012, November 13). The NoSQL Movement—Big Table Databases.
http://www.dataversity.net/the-nosql-movement-big-table-databases/
692
R. Branch et al.
[21] Fay Chang, J.D. (2006) Bigtable: A Distributed Storage System for Structured Data. OSDI’06: Seventh Symposium on
Operating System Design and Implementation.
[22] Nickolls, J. and Dally, W.J. (2010) The GPU Computing Era. IEEE Micro, 30, 56-69.
http://dx.doi.org/10.1109/MM.2010.41
[23] Jacobs, A. (2009) The Pathologies of Big Data.acmqueue.
[24] McGillicuddy, S. (2013) IBM Selects Juniper QFabric for Big Data Networking.
http://searchnetworking.techtarget.com/news/2240207684/IBM-selects-Juniper-QFabric-for-big-data-networking
[25] Sammer, E. (2012) Hadoop Operations. O’Reilly Media, Inc., Sebastopol.
[26] Merchant, S. (2011) Is a Fabric Architecture in Your Future?
http://www.datacenterknowledge.com/archives/2011/08/04/is-a-fabric-architecture-in-your-future/
[27] Cisco (2014) Big Data in the Enterprise—Network Design Considerations White Paper.
http://www.cisco.com/c/en/us/products/collateral/switches/nexus-5000-series-switches/white_paper_c11-690561.html
693
Scientific Research Publishing (SCIRP) is one of the largest Open Access journal publishers. It is
currently publishing more than 200 open access, online, peer-reviewed journals covering a wide
range of academic disciplines. SCIRP serves the worldwide academic communities and contributes
to the progress and application of science with its publication.
Other selected journals from SCIRP are listed as below. Submit your manuscript to us via either
[email protected] or Online Submission Portal.