Securing_Cloud_Data_Storage
Securing_Cloud_Data_Storage
Abstract : Innovations are necessary to ride the inevitable tide of change. Most of enterprises are striving to
reduce their computing cost through the means of virtualization. This demand of reducing the computing cost
has led to the innovation of Cloud Computing. One fundamental aspect of this new computing is that data is
being centralized or outsourced into the cloud. From the data owners perspective, including both individuals
and IT enterprises, storing data remotely in a cloud in a flexible on-demand manner brings appealing benefits:
relief of the burden of storage management, universal data access with independent geographical locations, and
avoidance of capital expenditure on hardware, software, personnel maintenance, and so on although the
infrastructures under the cloud are much more powerful and reliable than personal computing devices, they still
face a broad range of both internal and external threats to data integrity. Outsourcing data into the cloud is
economically attractive for the cost and complexity of long-term large scale data storage, it does not offer any
guarantee on data integrity and availability. We propose a distributed scheme to ensure users that their data
are indeed stored appropriately and kept intact all the time in the cloud. We are using erasure correcting code
in the file distribution preparation to provide redundancies. We are relaying on challenge response protocol
along with pre-computed tokens to verify the storage correctness of user’s data & to effectively locate the
malfunctioning server when data corruption has been detected. Our scheme maintains the same level of storage
correctness assurance even if users modify, delete or append their data files in the cloud.
Keywords - Cloud computing, Distributed data storage, Data security, Pervasive Computing, Virtualization.
I. INTRODUCTION
Cloud Computing moves the data & application software‟s to the large data centers, where the
management of the data and services may not be fully trustworthy. This poses many new security challenges
which have not been well understood. Cloud computing inevitably poses new challenging security threats for
number of reasons.
1. Due to user‟s loss of control over the data under cloud, we cannot directly adopt the traditional
cryptographic primitives for the purpose of data security protection. Therefore, verification of correct data
storage in the cloud must be conducted without explicit knowledge of the whole data. Considering various
kinds of data for each user stored in the cloud and the demand of long term continuous assurance of their
data safety, the problem of verifying correctness of data storage in the cloud becomes even more
challenging.
2. Cloud Computing is not just a third party data warehouse. User‟s may frequently update stored data by
performing operations like insertion, deletion, modification, appending, reordering, etc. To ensure storage
correctness under dynamic data update is hence of paramount importance. However, this dynamic feature
also makes traditional integrity insurance techniques futile.
3. The deployment of Cloud Computing is powered by data centers running in a simultaneous, cooperated and
distributed manner. Individual user‟s data is redundantly stored in multiple physical locations to further
reduce the data integrity threats. Therefore, distributed protocols for storage correctness assurance will be of
most importance in achieving a robust and secure cloud data storage system in the real world.
Our goal is to focus on cloud data storage security & to ensure the correctness of user‟s data in the
cloud. We aim to localize the errors & to perform successful recovery of data, as well as to provide support for
dynamic operations on data.
Recently, the importance of ensuring the remote data integrity has been highlighted by many research
works [1-4, 14]. These techniques, while can be useful to ensure the storage correctness without having users
possessing data, cannot address all the security threats in cloud data storage, since they are all focusing on single
server scenario and most of them do not consider dynamic data operations. As a complementary approach,
researchers [5, 10, 15] have also proposed distributed protocols for ensuring storage correctness across multiple
servers or peers. Again, none of these distributed schemes is aware of dynamic data operations. As a result, their
applicability in cloud data storage can be drastically limited.
Prior work [16] has addressed this storage security problem using either public key cryptography or
requiring the client to outsource its data in encrypted form which is not a feasible solution. Also some provable
www.iosrjournals.org 43 | Page
Securing Cloud Data Storage
data possession schemes [2, 3, 8, 9] are proposed but they have limitations like they need to migrate whole data
block in case of updating data block. Previous schemes [1-5, 8, 10] are not able to support dynamic data
operations which are very important in cloud computing. Some schemes [7, 17] need third parties to conduct the
audits which are again not the better option. The Challenge Response protocol is almost used in every scheme
with little modifications but they are having drawbacks of having large token size which puts burden on clients.
www.iosrjournals.org 44 | Page
Securing Cloud Data Storage
consistent. In fact, this is equivalent to the case where all servers are colluding together to hide a data loss
or corruption incident.
To ensure the security for cloud data storage under the aforementioned adversary model, we aim to
design efficient mechanisms for dynamic data verification and operation.
If a data word on device is updated from to , then each checksum word is recomputed by
applying a function such that
When up to devices fail, we reconstruct the system as follows. First, for each failed data device ,
we construct a function to restore the words in from the words in the non-failed devices. When that is
completed, we recomputed any failed checksum devices with .
For example, suppose . We can describe parity in the above terms. There is one checksum
device . To compute each checksum word , we take the parity (XOR) of the data words:
If a word on data device changes from to , then is recalculated from the parity of its old
value and the two data words:
www.iosrjournals.org 45 | Page
Securing Cloud Data Storage
If a device fails, then each word may be restored as the parity of the corresponding words on the
remaining devices:
In such a way, the system is resilient to any single device failure. We are given data
words . We define functions and which we use to calculate and maintain the checksum
words . We then describe how to reconstruct the words of any lost data device when up to
devices fail. Once the data words are reconstructed, the checksum words can be recomputed from the data words
and . Thus, the entire system is reconstructed.
When one of the data words changes to , then each of the checksum words must be changed as
well. This can be affected by subtracting out the portion of the checksum word that corresponds to , and
adding the required amount for .
Thus, is defined as follows:
Therefore, the calculation and maintenance of checksum words can be done by simple arithmetic.
We can view each device in the system as having a corresponding row of the matrix and the
vector . When a device fails, we reflect the failure by deleting the device‟s row from and from . What
results a new matrix and a new vector that adhere to the equation:
www.iosrjournals.org 46 | Page
Securing Cloud Data Storage
Once the values of are obtained, the values of any failed may be recomputed from . It should be
obvious that if fewer than devices fail; the system may be recovered in the same manner, choosing any
rows of . Thus system can tolerate any number of device failures up to m.
So, as per RS Raid algorithm, we divide the input file to the data vectors, where is number of
storage devices present in the system. The data vectors that are generated are of equal size, so the load will be
distributed equally to all the storage devices. We create matrix & store all the data vectors in
matrix . In next step we create a Reed Solomon matrix which is generated over Galois field, i. e. . In
our case we have assumed word size . After this stage, we perform matrix multiplication to generate
checksum matrix . We multiply data matrix with Reed Solomon matrix . The resultant matrix is the
redundant matrix which contains original data from data matrix & parity vectors added by Reed Solomon
matrix. It means matrix will be stored redundantly across the different storage devices & it will be used for
token computation as well as data recovery purpose.
www.iosrjournals.org 47 | Page
Securing Cloud Data Storage
some cases, the user may want to increase the size of his stored data in file by adding data at the end of the data
file, which we refer as data append operation. So in case of append operation whenever user append data to his
file, new verification tokens are calculated & stored on main cloud server & file is splitted as before and
dispersed among the cloud storage servers. Previous schemes [5, 6, 7] don‟t support insert operation on data. In
our scheme user can insert data at any location wherever he desires. In case of insert operation, we are treating
as a part of update operation and we are relaying on update operation for insert operation.
III. CONCLUSION
We have analyzed the data security concerns in cloud data storage, which is a distributed storage
system. We proposed a distributed scheme to ensure users that their data are indeed stored appropriately and
kept intact all the time in the cloud. To provide redundancy we used erasure correcting code in the file
distribution preparation. As we all know cloud is not just a third party data warehouse. So providing support for
dynamic operations is very important. Our scheme maintains the same level of storage correctness assurance
even if users modify, delete or append their data files in the cloud. Challenge response protocol along with pre-
computed token is used to verify the storage correctness of user‟s data & to effectively locate the malfunctioning
server when data corruption has been detected. Through detailed performance analysis, we show that our
scheme is having very low communication overhead & guarantees to detect every single unauthorized data
modification. Our scheme has no limitation on number of pre-computed tokens used for challenging the cloud
servers. Unlimited number of challenges can be made. But we still believe that data storage security in Cloud
computing is an area full of challenges and of paramount importance.
REFERENCES
[1] A. Juels, J. Burton, and S. Kaliski, “PORs: Proofs of Retrievability for Large Files,” Proc. ACM CCS „07, Oct. 2007, pp. 584–97.
[2] G.Ateniese, R. Burns, R. Curtmola, J. Herring, L. Kissner, Z. Peterson, and D. Song, “Provable Data P ossession at Untrusted
Stores,” Proc. ACM CCS „07, Oct. 2007, pp. 598–609.
[3] G. Ateniese, R. D. Pietro, L. V. Mancini, and G. Tsudik, “Scalable and Efficient Provable Data Possession,” Proc. SecureComm „08,
Sept. 2008.
[4] H. Shacham and B. Waters, “Compact Proofs of Retrievability,” Proc. Asia-Crypt „08, LNCS, vol. 5350, Dec. 2008, pp. 90–107.
[5] K. D. Bowers, A. Juels, and A. Oprea, “Hail: A High-Availability and Integrity Layer for Cloud Storage,” Proc. ACM CCS „09,
Nov. 2009, pp. 187–98.
[6] C.Wang, Qian Wang, Kui Ren, Wenjing Lou, “Ensuring Data Storage Security in Cloud Computing,” Proc. IWQoS „09, July 2009,
pp. 1–9.
[7] Q. Wang, C.Wang, Wenjing Lou, Jin Li, “Enabling Public Verifiability and Data Dynamics for Storage Security in Cloud
Computing,” Proc. ESORICS „09, Sept. 2009, pp. 355–70.
[8] C. Erway, Alptekin, Charalampos Papamanthou, Roberto Tamassia, “Dynamic Provable Data Possession,” Proc. ACM CCS „09,
Nov. 2009, pp. 213–22.
[9] R. Curtmola, O. Khan, R. Burns, and G. Ateniese, “MR-PDP: Multiple-replica provable data possession,” in Proc. of ICDCS‟08.
IEEE Computer Society, 2008, pp. 411–420.
[10] T. Schwarz and E. L. Miller, “Store, forget, and check: Using algebraic signatures to check remotely administered storage,” i n Proc.
of ICDCS‟06, 2006.
[11] N. Gohring, “Amazon‟s S3 down for several hours,” Online at http://www.pcworld.com/businesscenter/article/142549/,
amazons_s3_down_for_several hours.html, 2008.
[12] M. Arrington, “Gmail Disaster: Reports of Mass Email Deletions,” Dec. 2006; http://www.techcrunch.com/2006/12/28/gmail -
disaster-reports-of-massemail-deletions/
[13] Peter Mell, Tim Grance, “The NIST Definition of Cloud Computing”, Online at http://www.nist.gov/itl/cloud/upload-def-v15.pdf.
[14] K. D. Bowers, A. Juels, and A. Oprea, “Proofs of Retrievability: Theory and Implementation,” Cryptology ePrint Archive, Report
2008/175, 2008, http://eprint.iacr.org/.
[15] M. Lillibridge, S. Elnikety, A. Birrell, M. Burrows and M. Isard, “A Cooperative Internet Backup Scheme”, Proc. of the 2003,
USENIX Annual Technical Conference (General Track), pp. 29–41, 2003.
[16] Qian Wang, Cong Wang, Kui Ren, Wenjing Lou, and Jin Li, “Enabling Public Auditability and Data Dynamics for Storage Security
in Cloud Computing” , IEEE Transactions on Parallel & Distributed Systems, Volume: 22, Issue: 5, pages: 847 -859.
[17] C. Wang, Q. Wang, K. Ren, and W. Lou, “Privacy-preserving public auditing for storage security in cloud computing”, in Proc. of
IEEE INFOCOM’10, San Diego, CA, USA, March 2010.
[18] J. S. Plank and Y. Ding, “Note: Correction to the 1997 Tutorial on Reed-Solomon Coding,” University of Tennessee, Tech. Rep. CS-
03- 504, 2003.
www.iosrjournals.org 49 | Page