Data Storage in Cloud Computing
Data Storage in Cloud Computing
2017 International
Conference
Conference
on Next
onGeneration
Next Generation
Computing
Computing
and Information
and Information
Systems
Systems
(ICNGCIS)
Abstract—Cloud computing is a functional paradigm that is maintenance to utilization of facilities made available by
evolving and making IT utilization easier by the day for Cloud service providers. Cloud computing is about moving
consumers. Cloud computing offers standardized services, computation or data for cost and business
applications to users online and in a manner that can be advantages offsite to an internal or external, location
accessed regularly. Such applications can be accessed by as transparent, centralized facilities or contractor [3].
many persons as permitted within an organisation without Cloud computing has characteristics that include
bothering about the maintenance of such application. The resource pooling and multi-tenancy [2]. There are three
Cloud also provides a channel to design and deploy user basics service types in Cloud computing: the Software-as-
applications including its storage space and database without a-Service (SaaS), where applications are made available by
bothering about the underlying operating system. The Cloud Service Providers (CSPs) over the Internet to the
application can run without consideration for on-premise
Cloud users; Platform-as-a-Service (PaaS), wherein the
infrastructure. Also, the Cloud makes massive storage
CSPs offers the Cloud users platforms for development and
available both for data and databases. Storage of data on the
Cloud is one of the core activities in Cloud computing. Storage
deployment of their own applications; and Infrastructure-
utilizes infrastructure spread across several geographical as-a-Service (IaaS), where the CSPs offers compute,
locations. Storage on the Cloud makes use of the internet, storage, network and other computing resources to the
virtualization, encryption and others technologies to ensure Cloud users. The IaaS users have control over the operating
security of data. This paper presents the state of the art from system and applications running on them, while the
some literature available on Cloud storage. The study was provider manages the hardware infrastructure. These
executed by means of review of literature available on Cloud services are all made available to users anytime and from
storage. It examines present trends in the area of Cloud any location via the web.
storage and provides a guide for future research. The Cloud computing also has four modes of deployment,
objective of this paper is to answer the question of what the the private Cloud, public Cloud, community Cloud and the
current trend and development in Cloud storage is? The hybrid Cloud. The private Cloud is owned and controlled
expected result at the end of this review is the identification of by an individual organization. The facilities could on–
trends in Cloud storage, which can beneficial to prospective promise or off–premise. Private Cloud allow for more
Cloud researches, users and even providers. secured environment due to internal staff utilization. Public
Cloud is owned and managed by major CPSs. These
Keywords-Cloud computing; Cloud Storage; databases; providers own large data centres, sometime spread across
Cloud Infrastructure
different geographical locations. They provide various
I. INTRODUCTION services that free the customer from expensive
infrastructural procurements. Community Clouds belong to
Cloud computing is defined by [1] as a parallel and several organizations that come together based on shared
distributed computing system consisting of a pool of inter- common interest. The community Cloud may be managed
connected and virtualized computers that are dynamically by the community or a third party. Hybrid Cloud is a
provisioned and presented a single computing resource to combination of either private, public or community Cloud.
the users based on pre-agreed Service Level Agreements The hybrid Cloud share the same infrastructure but the
(SLA). It enables users to remotely run their applications as organizations are unique.
well as store data with the benefit of an on-demand and A major component of Cloud computing is storage.
highly available service; without the burden of local Storage could be for an enterprise database or simple
hardware and software management. With Cloud storage, storage of data similar to storing information on a local hard
data is stored on multiple third party servers, rather than on drive. In Cloud storage, data is stored in multiple third party
the dedicated server used in traditional networked data services rather than on dedicated servers used in traditional
storage. Third party service providers are entrusted with networked data storage [4]. When storing data, the customer
users’ data and for security purposes the exact storage “sees” a virtual server, hence it appear that data is stored in
locations of these data are unknown to most people. Cloud a particular place with a specific name, but such a place does
computing is positively impacting the IT landscape using not exist in reality. It is just a pseudonym used to reference
the Internet as it enables users pay on a per services usage a virtual space carved out in the Cloud. The users’ data
bases. User concerns are thus shifted from acquisition and
30
Service Interface 4) Data Migration and Load Balancing
Storage Overlay Cloud data migration involves moving data from one
storage location to another probably in different locations.
Metadata Management The essence is to ensure local balancing in the Cloud
Storage Management storage system. When the storage capacity is used over
Network and Storage Infrastructure certain values, the data should be migrated to other Cloud
storage units, while keeping the pointers in the old storage
Figure. 2. Cloud Storage Layered Model.
position or modify and update the metadata at the same
time. Local balancing is meant to keep available storage
The various layers in the Cloud Storage Model depicted
spaces for latter application in different storage devices on
in Fig. 2 are described as follows:
the Cloud. It can improve storage responsibility and
• Network and storage infrastructure: consists of
availability. Data migration is one of the effective means for
distributed wired and wireless networks
load balancing but may lead to bandwidth and I/O
interconnecting storage devices.
processes. Data replication is a type of migration where the
• Storage management: geographically distributed original data is preserved. Data replication is a solution to
storage resources are organized by domains and the single point in distributed Cloud storage, which keeps
logical entities. In addition, data can be stored by multiple copies of the same content in different storage
file or blocks in storage media. devices and locations. The ideal Cloud storage system
• Metadata Management: clusters the global domain should automatically create needed copies based on the
data storage metadata information and collaborates user’s access frequency and server workloads.
different domains for load balancing purposes. 5) Data Deduplication
• Storage Overlay: virtualization, service retrieving Data deduplications deals with storage, backup,
and redirecting are handles at this layer. A recovery and archiving meant to reduce the space occupied
middleware can be used to links distributed data in storage by compressing the internal duplication data.
storage devices and then present them as a single Data deduplication is the best way to reduce data volumes,
and simplified virtual storage network to the users. slash storage requirements, minimize data protection cost
• Service Interface: provides clients with a uniform and risks. In view of the experimental growth expected in
interface to access the Cloud storage system. data for enterprise and science, there will be need for
B. Key Isses of Cloud Storage Services massive storage, and data deduplication will help to save
space and cost.
In Cloud computing, data is stored on multiple third
6) Cloud Storage Security
party servers, rather than dedicated servers as used in
Cloud storage security involves storage media physical
traditional data centres. The following are issues relating to
security and data security. Cloud storage and security
Cloud storage services as described in [10].
involves certification, authority, audit and encryption
1) Deployment of Cloud Storage among others. Cloud storage security also affects procedure
The sale of Cloud storage should be based on application of storage service, which includes software, hardware, data
requirements and technology. The common storage information, network security and user privacy.
networks are integrated by middleware and overlay layer.
The geographical location should be selected by the data C. Cloud Storage Security Concerns
requirement application. The cost of storage should be Cloud computing does not provides control to users over
optimized based on the deployment mode. Feedbacks from the data stored in the Cloud data centres [14]. The Cloud
various servers and clients should be collected to adjust the providers have full control and they can perform task such
distribution policies and access control. as copying, destroying, modify etc. The lack of control by
2) Virtualization and Availability of Cloud Storage users, concepts of multi–tenancy and virtualization have
Virtualization is applied to many domains including high security risks associated with Cloud computing than
operating systems, servers, network and storage. Storage information stored in traditional data centre. Four security
virtualization is meant to map logical storage to physical issues are identified with Cloud storage [14].
storage in data access procedure. The Cloud virtualization 1) Data Privacy and Integrity
will help to hide storage locations and storage modes from Cloud computing is vulnerable to treats in the area of
the users. The availability of Cloud involves persistent data integrity, confidentially, privacy and availability. Due
runtime and recovery. to its simplicity, Cloud users are increasing exponentially
3) Data Organization and more applications are being hosted in the Cloud. A
The organization of data in storage could be in database successful attack on any aspect of data in storage could lead
mode, file or block level. The database can be open source to a breach that can grant unauthorized access to data of all
or proprietary. The database can only manage some specific Cloud users. Based on virtualization it is possible for data
data types. The block level is the lowest storage data format, to be processed by multiple persons due to multi–tenancy.
both database and file utilize block level. Block level must It is also possible for a malicious insider to breach data
be combined with other storage organization mode. security during processing. There are is also the unanswered
31
question of what exactly CSPs do with their users’ data % increase in spend while software and infrastructure
which they house in their data centres. would experience 48 % and 33 % increase respectively. It
2) Data Recoverability and Vulnerability was also reported that about 30 % of correspondents plan to
Due to the elastic nature of the Cloud and other implement a Cloud storage system as a data backup option.
characteristics such as resource pooling and multi-tenancy, Virtualization has always been a major backbone for
data can be breached on the Cloud. The resource allocated Cloud computing and recent surveys report that it would
to a particular Cloud user may be assigned later to another continue to be in the foreseeable future. The 2016 survey
user later. In terms of memory and storage, a malicious user [14] projected that Server virtualization still remains a
can employ recovery techniques to obtain data from a priority for many Cloud users with a 38 % positive feedback
previous user. from respondent, it is followed by virtualization of data
3) Improper Media Santization storage with 24 % and virtual desktop environment at 22%.
The issue is related to the physical media destruction The survey also shows that 23 % of respondent were
due to various reasons. There may be a need to change disk planning to deploy Cloud virtual servers as backup systems.
or the need to remove data from a disk. In addition, there B. Cloud Storage Appliances [15]
may be need for termination of service. If the CSP does not
sanitize the devices properly, it may be exposed to risks. Cloud storage appliances have evolved to make Cloud a
Also, multi–tenancy contributes to the risk of device more practical proposition in work and office contexts.
sanitization. They act as translators and accelerators that will allow
4) Data Backup business systems to access private and public Cloud storage
Data backup is also an issue that must be dealt with as if it were local storage. Cloud storage is bringing about
carefully. A regular backup is needed by the CSP to ensure less hardware to buy and manage, usage–based pricing and
the availability and recovery of data in case of intentional easy access from anywhere. However, what works well
and accidental disasters. Moreover, backup needs to be when storing smart phone photos is not same for enterprise
protected against unauthorized access and tampering. There data storage. It is one thing to use a web–based app that
are several security models aimed at guaranteeing using backs into Cloud storage, but quite another to use Cloud
data in storage. SecCloud [11] uses a storage security storage with enterprise applications, even ones that are as
protocol that not only secures user data uploaded into the apparently as simply file-sharing. That is because most
Cloud, but also secures computation performed on user Cloud storage is object based and stateless, accessed via
data. In [9], a scheme is proposed that allows users to rate web–friendly APIs, whereas enterprise software is typically
the requirement of confidentiality, availability and integrity file or block–based, although this is changing with the
on a scale of 1–10. The values are used to determine amplification of enterprise. Unlikely legacy applications, in
sensitivity rating of user’s data and eventual protection. A enterprise, web apps are usually designed to cope with the
solution was proposed in [11], based on–demand data latency and bandwidth issues associated with connections
correctness verification. The model conducts the over a wide area network such as Internet. A hardware
verification of Cloud data correctness without explicit gateway can help by including local storage as cache or
knowledge of whole data. It is also possible to encrypt data buffer. This is especially useful in common use cases such
before outsourcing but there is a lot of overload if such data as Cloud backup and archiving, where local caches can
is to be shared. accelerate back up operations and access to online data.
5) Data Outage Some appliances are discussed in the following sub-
Many customers wants reliable elastic and highly sections.
available storage online. Cloud providers compete on price, 1) On-Premise Gateway
guarantees of uptime and availability in the form of SLAs. In this model, an appliance (physical or virtual) sits on
Cloud providers offer strong protection against component the premises and is connected on one side to the internal
failures, so there is no compelling need to add another fail LAN and the other to the Cloud. It might take Cloud storage
safe on top of Cloud storage systems. Despite all this, and present it to servers such as iSCSI block LUNs or as
outages occur on the Cloud data centres that leads to loss of CIFS file – server volumes. These devices can also include
data for many Cloud users. local storage tier for certain data for performance reasons.
2) Cloud Controllers
IV. CLOUD STORAGE TRENDS As well as gateway capabilities, these devices aim to
provide services similar to those offered by the traditional
A. Cloud Storage Projections
enterprise storage arrays, except that the data is stored in the
The 2016 Computer Weekly IT Priority Survey Cloud. They add features such as data deduplication,
indicates that Cloud is top priority for IT department while compression and encryption and Cloud–based clones and
storage and back up for virtualized environment are key snapshots.
issues [15]. According to the survey, it is projected that the 3) Cloud Integrated Storage
total amount being spent on Cloud would overshadow that These provide a higher degree of integration between
spent on in-house hardware and software in the not too Cloud and local storage. In this model, data is dynamically
distant future. From the survey, it was also projected that moved to the most appropriate tier based on policy. Hybrid
Cloud compute and storage services would experience a 50 Cloud storage arrays are now being developed that are
32
deployed in-house but have built-in Cloud integration results measured at the 95th percentile indicating that IBM
capabilities that enable them add and utilize storage tier Cloud object storage dedicated service delivered a “read”
located within the Cloud. latency which was an average of five times lower than
4) Cloud Resident Gateways Amazon’s S3; while the “write” latencies were on an
These are similar to Cloud integrated storage but average 6.5 times lower than AWS’s S3.
resident in the Cloud as a virtual appliance, these serve
applications that have been migrated to the Cloud. For V. CONCLUSION
example, Avere’s CloudFusion gateway takes the different Cloud computing provides compute, storage and
tier of Cloud storage available to it like Amazon EC2 RAM, application services among others to users over the Internet.
solid state disk or bulk S3 storage and builds them into a The resources made available to users by CSPs has reduced
virtual tiered network–attached storage filer. Some the need for expenditures on infrastructure. The Cloud is
common examples of Cloud–resident gateway are as used for numerous activities but prominent among them are
follows: computation and storage. This paper focused on Cloud
a. Amazon’s AWS storage gateway: This sends only storage. A review of Cloud storage systems, architecture,
changed data as a means of saving bandwidth and models and challenges was done. A comparison of some of
allows primary data to stay on-premise. the storage features offered by two popular Cloud Storage
b. Microsoft StorSimple: a hybrid local storage device Service Providers IBM and Amazon was also done. In
with Cloud connectivity. It is designed to work as conclusion it is important to note that despite certain Cloud
primary on–premises storage, while using Azure challenges particularly in terms of security and privacy,
for Cloud – based archiving, backup and data Cloud storage is still being adopted at a tremendous rate;
recovery. and research works are still on-going in a bid to further push
c. Barracuda Backup: This acts primarily as an on– the boundaries of Cloud storage adaptation
premise backup system but includesdata
deduplicating on to the Cloud. TABLE 1: COMPARATIVE PERFORMANCE OF IBM CLOUD OBJECT
d. Nasuni filers. This blends local disk and flash STORAGE VERSUS AMAZON S3
storage with Cloud storage, creating a Cloud –
integrated unified storage system able to serve Feature IBM Cloud Object Amazon S3
block and file workloads. Storage
Single/multi- Multi-tenant Multi-tenant only
C. IBM Cloud Object Storage. tenancy options Single-tenant (Dedicated
Until recently, enterprise did not have too many options Service)
in terms of deploying high–performance object storage
solution across both the Cloud and the on–premise data Deployment On premise (appliance or Cloud only
centres. A 2016 executive brief sponsored by IBM title options licensed software) –
managed by enterprise or
“Which Cloud storage service delivers the performance you IBM
need? Comparing IBM Cloud object storage and Amazon IBM Cloud
S3”, reported the introduction of IBM Cloud Object Storage Unified hybrid
[17]. The solution enabled users take control of their Cloud deployments
and provided options to choose the optimum between a
plethora of various deployment models, costs options, and Customization With Dedicated Service, No customized
performance for each workload. A concise comparison of and control dynamic control over control or workload
some of the features provided by the IBM system versus performance, on a visibility
those of Amazon’s Simple Storage Service (S3) is shown workload basis Standard reporting
Visibility and reporting
on Table 1
1) Throughput Results
When the environment were configured comparably, API support OpenStack Swift S3 only
IBM Cloud object storage dedicated service delivered 1.9x S3 Compatible API
Simple object API
higher “read” throughput and 3.3x higher “write” NSF/SMB
throughput than S3. The higher performance was attributed
to a single–tenant architecture of IBM which maximizes the
server resources available to the workload. In addition, the REFERENCES
authors also reported that the write–optimized IBM Cloud [1] R. Buyya, C. Yeo, S. Venugopal, J. Broberg & I. Brandic, 2009.
object storage dedicated delivered 1.7x faster “read” and Cloud computing and emerging IT platforms: Vision, hype, and
reality for delivering computing as the 5th utility, Future Generation
9.9x faster “write” performance than S3. Computer Systems. Journal of Future Generation Computer
2) Latency Science. vol. 25 no. 6 pp. 599-616
In this test, the authors reported that both IBM and AWS [2] P. Mell, T. Grance, The NIST Definition of Cloud Computing, NIST
systems were configured with similar settings and tested Special Publication 800-145, 2011
comparably using similar indexing configurations, applying
a constant request rate of 420 requests per second. Latency
33
[3] M. Ali, S. Khan, A. Vasilakos, Security in Cloud Computing:
Opportunities and Challenges, Information Sciences 305 (2015)
357-383.
[4] K. Bowers, A. Juels, A. Oprea, HAIL: A High-Availability and
Integrity Layer for Cloud Storage, CCS’09, November 9–13, 2009,
ACM 978-1-60558-352-5/09/11.
[5] Y. Cao, C. Chen, F. Guo, D. Jiang, Y. Lin, B. Ooi, H.Vo, S. Wu, Q.
Xu, ES2: A Cloud Data Storage System for Supporting Both OLTP
and OLAP, Accessed on 24 May 2017
[6] R. Shaikh, M. Sasikumar, Data Classification for Achieving
Security in Cloud Computing, Procedia Computer Science 45
(2015) 493 – 498.
[7] C. Wang, Q. Wang, K. Ren, N. Cao, W. Lou, Towards Secure and
Dependable Storage Services in Cloud Computing, IEEE
Transactions on Cloud Computing Date of Publication: April-June
2012 Volume: 5 , Issue: 2.
[8] A. Kumar, B. Lee, H. Lee, A. Kumari, Secure Storage and Access
of Data in Cloud Computing, ICTC 2012, 9 78-1-4673-4828.
[9] H. Abu-Libdeh, L. Princehouse, H. Weatherspoon, RACS: A Case
for Cloud Storage Diversity, SoCC’10, June 10–11, 2010, ACM
978-1-4503-0036-0/10/06.
[10] Q. Liu, G. Wang, J. Wu, Efcient Sharing of Secure Cloud Storage
Services, Access on 24 May 2017
[11] J. Wu, L. Ping, X. Ge,Y. Wang, J. Fu, Cloud Storage as the
Infrastructure of Cloud Computing, 2010 International Conference
on Intelligent Computing and Cognitive Informatics. DOI
10.1109/ICICCI.2010.119.
[12] W. Zeng, Y. Zhao, K. Ou, W. Song, Research on Cloud Storage
Architecture and Key Technologies, ICIS 2009, November 24-26,
2009, ACM 978-1-60558-710-3/09/11.
[13] A. Singh, S. Pasupuleti, Optimized Public Auditing and Data
Dynamics for Data Storage Security in Cloud Computing, 6th
International Conference on Advances in Computing &
Communications, ICACC 2016, 6-8 September 2016, Cochin,
India. Procedia Computer Science 93 (2016) 751 – 759
[14] N. Vurukonda, B. Rao, A Study on Data Storage Security Issues in
Cloud Computing, 2nd International Conference on Intelligent
Computing, Communication & Convergence (ICCC-2016) ,
Procedia Computer Science 92 (2016) 128 – 135
[15] A. Adshead Cloud, compliance and data protection top storage
priorities for 2016, TechTarget, ComputerWeekly Publication
[16] B. Betts Cloud storage appliances: what are they and who provides
them, 2016, TechTarget, ComputerWeekly Publication
[17] L. Stadtmueller, Which Cloud Storage Service Delivers the
Performance You Need? Comparing IBM Cloud Object Storage and
Amazon S3, An Executive Brief Sponsored by IBM, 2016
Stratecast.
34