0% found this document useful (0 votes)
2 views

Securing_Cloud_Data_Storage

The document discusses the challenges and solutions related to securing cloud data storage, emphasizing the importance of data integrity and availability in cloud computing. It proposes a distributed scheme utilizing erasure-correcting codes and challenge-response protocols to ensure the correctness of user data, even during dynamic operations like updates and deletions. The authors highlight the need for robust mechanisms to protect against both weak and strong adversaries that may compromise data integrity in cloud environments.

Uploaded by

annesabina999
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views

Securing_Cloud_Data_Storage

The document discusses the challenges and solutions related to securing cloud data storage, emphasizing the importance of data integrity and availability in cloud computing. It proposes a distributed scheme utilizing erasure-correcting codes and challenge-response protocols to ensure the correctness of user data, even during dynamic operations like updates and deletions. The authors highlight the need for robust mechanisms to protect against both weak and strong adversaries that may compromise data integrity in cloud environments.

Uploaded by

annesabina999
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 7

IOSR Journal of Computer Engineering (IOSRJCE)

ISSN: 2278-0661 Volume 1, Issue 6 (July-Aug. 2012), PP 43-49


www.iosrjournals.org

Securing Cloud Data Storage


S. P. Jaikar1, M. V. Nimbalkar2
1,2
(Department of Information Technology, Sinhgad College of Engineering/ University of Pune, India)

Abstract : Innovations are necessary to ride the inevitable tide of change. Most of enterprises are striving to
reduce their computing cost through the means of virtualization. This demand of reducing the computing cost
has led to the innovation of Cloud Computing. One fundamental aspect of this new computing is that data is
being centralized or outsourced into the cloud. From the data owners perspective, including both individuals
and IT enterprises, storing data remotely in a cloud in a flexible on-demand manner brings appealing benefits:
relief of the burden of storage management, universal data access with independent geographical locations, and
avoidance of capital expenditure on hardware, software, personnel maintenance, and so on although the
infrastructures under the cloud are much more powerful and reliable than personal computing devices, they still
face a broad range of both internal and external threats to data integrity. Outsourcing data into the cloud is
economically attractive for the cost and complexity of long-term large scale data storage, it does not offer any
guarantee on data integrity and availability. We propose a distributed scheme to ensure users that their data
are indeed stored appropriately and kept intact all the time in the cloud. We are using erasure correcting code
in the file distribution preparation to provide redundancies. We are relaying on challenge response protocol
along with pre-computed tokens to verify the storage correctness of user’s data & to effectively locate the
malfunctioning server when data corruption has been detected. Our scheme maintains the same level of storage
correctness assurance even if users modify, delete or append their data files in the cloud.
Keywords - Cloud computing, Distributed data storage, Data security, Pervasive Computing, Virtualization.

I. INTRODUCTION
Cloud Computing moves the data & application software‟s to the large data centers, where the
management of the data and services may not be fully trustworthy. This poses many new security challenges
which have not been well understood. Cloud computing inevitably poses new challenging security threats for
number of reasons.
1. Due to user‟s loss of control over the data under cloud, we cannot directly adopt the traditional
cryptographic primitives for the purpose of data security protection. Therefore, verification of correct data
storage in the cloud must be conducted without explicit knowledge of the whole data. Considering various
kinds of data for each user stored in the cloud and the demand of long term continuous assurance of their
data safety, the problem of verifying correctness of data storage in the cloud becomes even more
challenging.
2. Cloud Computing is not just a third party data warehouse. User‟s may frequently update stored data by
performing operations like insertion, deletion, modification, appending, reordering, etc. To ensure storage
correctness under dynamic data update is hence of paramount importance. However, this dynamic feature
also makes traditional integrity insurance techniques futile.
3. The deployment of Cloud Computing is powered by data centers running in a simultaneous, cooperated and
distributed manner. Individual user‟s data is redundantly stored in multiple physical locations to further
reduce the data integrity threats. Therefore, distributed protocols for storage correctness assurance will be of
most importance in achieving a robust and secure cloud data storage system in the real world.
Our goal is to focus on cloud data storage security & to ensure the correctness of user‟s data in the
cloud. We aim to localize the errors & to perform successful recovery of data, as well as to provide support for
dynamic operations on data.
Recently, the importance of ensuring the remote data integrity has been highlighted by many research
works [1-4, 14]. These techniques, while can be useful to ensure the storage correctness without having users
possessing data, cannot address all the security threats in cloud data storage, since they are all focusing on single
server scenario and most of them do not consider dynamic data operations. As a complementary approach,
researchers [5, 10, 15] have also proposed distributed protocols for ensuring storage correctness across multiple
servers or peers. Again, none of these distributed schemes is aware of dynamic data operations. As a result, their
applicability in cloud data storage can be drastically limited.
Prior work [16] has addressed this storage security problem using either public key cryptography or
requiring the client to outsource its data in encrypted form which is not a feasible solution. Also some provable

www.iosrjournals.org 43 | Page
Securing Cloud Data Storage

data possession schemes [2, 3, 8, 9] are proposed but they have limitations like they need to migrate whole data
block in case of updating data block. Previous schemes [1-5, 8, 10] are not able to support dynamic data
operations which are very important in cloud computing. Some schemes [7, 17] need third parties to conduct the
audits which are again not the better option. The Challenge Response protocol is almost used in every scheme
with little modifications but they are having drawbacks of having large token size which puts burden on clients.

II. SYSTEM ARCHITECTURE


The general architecture of cloud storage system is illustrated in Fig.1. Generally two different network
entities can be identified. We have assumed that user‟s have direct peer to peer connection between them &
cloud. Users will upload their data to cloud and only they can access it & not any other cloud users. Different
network entities are mentioned below:
 User: users, who have data to be stored in the cloud and rely on the cloud for data computation, consist of
both individual consumers and organizations.
 Cloud Service Provider (CSP): CSP is who has the capabilities to host data & applications of users. They
have huge resources that they can provide dynamically for satisfying various user needs. CSP having
expertise in building & managing cloud servers, having their own data centers for hosting user‟s data.

Fig.1: Storage Architecture for Cloud


Fig.1 shows how the data is outsourced in cloud and users have no control over it. This also gives
perception of the problem with the storage and to ensure the integrity of the data in the cloud. In cloud data
storage, a user stores his data through a CSP into a set of cloud servers, which are running in a simultaneous,
cooperated and distributed manner. Data redundancy can be employed with technique of erasure-correcting code
to further tolerate faults or server crash as user‟s data grows in size and importance. Thereafter, for application
purposes, the user interacts with the cloud servers via CSP to access or retrieve his data. In some cases, the user
may need to perform operations on his data.
The most general forms of these operations we are considering are update, delete, insert and append.
As users no longer possess their data locally, it is of critical importance to assure users, that their data are being
correctly stored and maintained. That is, users should be equipped with security means so that they can make
continuous correctness assurance of their stored data even without the existence of local copies.
In case those users do not necessarily have the time, feasibility or resources to monitor their data, user‟s
can delegate the tasks to an optional trusted TPA of their respective choices. But users need to pay to the Third
Party Auditors for that. This is not our aim, what we want is to give freedom to users to ensure intactness of
their data in cloud. In our scheme, we assume that the point-to-point communication channels between each
cloud server and the user is authenticated and reliable. Security threats faced by cloud data storage can come
from two different sources. On the one hand, a CSP can be self-interested, untrusted and possibly malicious. It
may also attempt to hide a data loss incident due to management errors, Byzantine failures and so on. On the
other hand, there may also exist an economically motivated adversary, who has the capability to compromise a
number of cloud data storage servers in different time intervals and subsequently is able to modify or delete
user's data while remaining undetected by CSPs for a certain period. So we have attackers with different
purposes in different context & we need to classify them as per the severity of damage they can do to storage.
Depending upon various motivations of attackers we have classified them into categories. Specifically,
we consider two types of adversary with different levels of capability:
 Weak Adversary: The adversary is interested in corrupting the user‟s data files stored on individual servers.
Once a server is comprised, an adversary can pollute the original data files by modifying or introducing its
own fraudulent data to prevent the original data from being retrieved by the user.
 Strong Adversary: This is the worst case scenario, in which we assume that the adversary can compromise
all the storage servers so that he can intentionally modify the data files as long as they are internally

www.iosrjournals.org 44 | Page
Securing Cloud Data Storage

consistent. In fact, this is equivalent to the case where all servers are colluding together to hide a data loss
or corruption incident.
To ensure the security for cloud data storage under the aforementioned adversary model, we aim to
design efficient mechanisms for dynamic data verification and operation.

2.1 Notation & Preliminaries


 The data file to be stored. We assume that can be denoted as a matrix of equal-sized data vectors,
each consisting of blocks. Data blocks are all well represented as elements in Galois Field
for .
 The dispersal matrix used for Reed-Solomon coding.
 Data matrix constructed over data vectors.
 The encoded file matrix, which includes a set of vectors, each consisting of blocks.
 Pseudorandom function.
 Pseudorandom permutation.

2.2 File Distribution Preparation


It is well known that erasure-correcting code may be used to tolerate multiple failures in distributed
storage systems [18]. In cloud data storage, we rely on this technique to disperse the data file redundantly
across a set of distributed servers. Reed-Solomon erasure-correcting code is used to
create redundancy parity vectors from data vectors in such a way that the original data vectors can be
reconstructed from any out of the data and parity vectors. By placing each of the vectors on a
different server, the original data file can survive the failure of any of the servers without any data loss,
with a space overhead of . For support of efficient sequential I/O to the original file, our file layout is
systematic, i.e., the unmodified data file vectors are distributed across different servers. We are using
Reed Solomon Algorithm to disperse the file redundantly over storage devices.

2.2.1 RS-RAID ALGORITHM


Let there be storage devices each of which holds bytes. These are called the
Data Devices. Let there be more storage devices each of which also holds bytes. These
are called the Checksum Devices. The contents of each checksum device will be calculated from the contents of
the data devices. The goal is to define the calculation of each such that if any of
fail, then the contents of the failed devices can be reconstructed from
the non-failed devices. In other words we have data words and checksum words
which are computed from the data words in such a way that the loss of any words can be
tolerated.
To compute a checksum word for the checksum device , we apply function to the data words:

If a data word on device is updated from to , then each checksum word is recomputed by
applying a function such that

When up to devices fail, we reconstruct the system as follows. First, for each failed data device ,
we construct a function to restore the words in from the words in the non-failed devices. When that is
completed, we recomputed any failed checksum devices with .
For example, suppose . We can describe parity in the above terms. There is one checksum
device . To compute each checksum word , we take the parity (XOR) of the data words:

If a word on data device changes from to , then is recalculated from the parity of its old
value and the two data words:

www.iosrjournals.org 45 | Page
Securing Cloud Data Storage

If a device fails, then each word may be restored as the parity of the corresponding words on the
remaining devices:

In such a way, the system is resilient to any single device failure. We are given data
words . We define functions and which we use to calculate and maintain the checksum
words . We then describe how to reconstruct the words of any lost data device when up to
devices fail. Once the data words are reconstructed, the checksum words can be recomputed from the data words
and . Thus, the entire system is reconstructed.

2.2.2 Calculating & Maintaining Checksums


We define each function to be a linear combination of the data words:
=
In other words, if we represent the data and checksum words as the vectors and , the functions
as rows of the matrix , then the state of the system adheres to the following equation:

We define to be the matrix , and thus the above equation becomes:

When one of the data words changes to , then each of the checksum words must be changed as
well. This can be affected by subtracting out the portion of the checksum word that corresponds to , and
adding the required amount for .
Thus, is defined as follows:

Therefore, the calculation and maintenance of checksum words can be done by simple arithmetic.

2.2.3 RECOVERING FROM FAILURES


To explain recovery from errors, we define the matrix and the vector as follows:
Then we have the following equation

We can view each device in the system as having a corresponding row of the matrix and the
vector . When a device fails, we reflect the failure by deleting the device‟s row from and from . What
results a new matrix and a new vector that adhere to the equation:

Suppose exactly devices fail. is matrix. Because matrix is defined to be a Vandermonde


matrix, every subset of rows of matrix is guaranteed to be linearly independent. Thus, the matrix is non-
singular, and the values of may be calculated from using Gaussian Elimination. Hence all data
devices can be recovered.

www.iosrjournals.org 46 | Page
Securing Cloud Data Storage

Once the values of are obtained, the values of any failed may be recomputed from . It should be
obvious that if fewer than devices fail; the system may be recovered in the same manner, choosing any
rows of . Thus system can tolerate any number of device failures up to m.
So, as per RS Raid algorithm, we divide the input file to the data vectors, where is number of
storage devices present in the system. The data vectors that are generated are of equal size, so the load will be
distributed equally to all the storage devices. We create matrix & store all the data vectors in
matrix . In next step we create a Reed Solomon matrix which is generated over Galois field, i. e. . In
our case we have assumed word size . After this stage, we perform matrix multiplication to generate
checksum matrix . We multiply data matrix with Reed Solomon matrix . The resultant matrix is the
redundant matrix which contains original data from data matrix & parity vectors added by Reed Solomon
matrix. It means matrix will be stored redundantly across the different storage devices & it will be used for
token computation as well as data recovery purpose.

2.3 CHALLENGE TOKEN PRECOMPUTATION


To verify the correctness of user‟s data & to locate the errors, we entirely rely on the pre-computed
verification tokens. These tokens are calculated before file distribution & they are very short. We are computing
the tokens by pseudorandom function & pseudorandom permutation function . We pre-computes short
verification tokens on individual vector, each token covering a random subset of data blocks. We have assumed
block size as 256 bits & as 8 number of verification per indices. We have three data devices and three
checksum devices. Then and . We choose , since . Next, we set
and table‟s. and tables are shown in Table 1.

Table 1: and tables for .

We construct to be a matrix, defined over .

Now, we can calculate each word of each checksum device using


Later, when the user wants to make sure the storage correctness for the data in the cloud, he challenges
the cloud servers. Upon receiving challenge, cloud server computes the new value of tokens, which is compared
with previously calculated tokens. It gives clear idea about integrity of user‟s data. Again it helps to locate the
error which has not been done in previous research work [1, 2, 3, 4, 8]. In previous work, we were just able to
detect whether the data is intact or not. So it just provides us with binary results & not the exact location of
errors.
Algorithm: TOKEN PRE-COMPUTATION
1. Begin
2. Choose file to upload & encrypt the file using
3. Generate Vector Matrix on file .
4. Create Reed Solomon Matrix over Galois Field where .
5. Generate Matrix . It is Checksum Matrix created for fault tolerance.
6. Compute Token over Matrix i.e. , where block size, no. of tokens,
indices per verification. Compute the tokens by pseudorandom function & pseudorandom permutation
function .
7. Store these precomputed tokens on the main cloud server.
8. Disperse the file over the Cloud. i.e.
9. End.

www.iosrjournals.org 47 | Page
Securing Cloud Data Storage

2.4 CORRECTNESS VERIFICATION & ERROR LOCALIZATION


To eliminate the errors in storage systems key prerequisite is to locate the errors. However, many
previous schemes do not explicitly consider the problem of data error localization, thus only provide binary
results for the storage verification. In our scheme we integrate the correctness verification and error localization
in our challenge-response protocol. The newly computed tokens from servers for each challenge are compared
with pre-computed tokens to determine the correctness of the distributed storage. This also gives information to
locate potential data errors.
Algorithm: CORRECTNESS VERIFICATION
1. Begin Challenge , for , where total number of cloud servers.
2. Get // Getting precomputed tokens from main cloud server.
3. // Reading file blocks from all cloud servers for calculating new tokens.
4. Generate Vector Matrix on all file blocks that are read in step 3.
5. Create Reed Solomon Matrix
6. Generate Matrix . On this matrix, new tokens will be computed.
7. Compute token on Matrix .
8. If ( then,
Data is intact Else
Data is Corrupt. For that , initiate the recovery.
9. End

2.5 ERROR RECOVERY & FILE RETRIEVAL


Once the data corruption is detected, next important step is to recover the corrupted data and bring data
storage back to consistent state. The comparison of pre-computed tokens and received response values can
guarantee the identification of misbehaving server. Therefore user can recover the corrupted data. Our system
recovers data from backup server & distributes all data vectors to corresponding servers. This will results in
successful recovery of corrupted data. But due to file splitting we made at the time of file distribution, user‟s
need to recover file from all the servers. Error localization is limited to misbehaving servers only, i.e. servers
giving false assurance of posing user‟s data.
Algorithm: Error Recovery
1. Begin (Assume that the data corruptions have been detected & servers have been identified
misbehaving.)
2. Download consistent data blocks from backup server.
3. Create the data vectors as per number of cloud storage servers.
4. Distribute the consistent data blocks to corresponding servers & recover the data.
5. End.

2.6 DYNAMIC OPERATIONS


In cloud data storage, there are many potential scenarios where data stored in the cloud is dynamic, like
electronic documents, photos, or log files etc. Therefore, it is crucial to consider the dynamic case, where a user
may wish to perform various operations of update, delete and append to modify the data file while maintaining
the storage correctness assurance. The straightforward and trivial way to support these operations is for user to
download all the data from the cloud servers and re-compute the whole parity blocks as well as verification
tokens. This would clearly be highly inefficient. In cloud data storage, sometimes the user may need to modify
some data stored in the cloud, from its current value to a new one. We refer this operation as data update. To
perform update operation on particular data block client need to recalculate the verification token on updated
data. Also client need to update this value of newly calculated token to all the replicas of file in storage cloud.
When user want to perform update operation, the splitted file from all storage servers is merged and given to the
user to perform data updates. Once user has finished with the updating the data, new tokens are calculated on
whole file and they are stored on main cloud server. After this, updated file is splitted back and dispersed onto
corresponding cloud storage servers. Update operations include modifying file, inserting data, as well as
deleting data from file.
Sometimes, after being stored in the cloud, certain data may need to be deleted. The delete operation
we are considering is a general one. When user wants to delete some file, he can simply delete it. In delete
operation, file blocks that are distributed among cloud storage servers are all deleted. Once file is deleted, we
cannot perform any recovery of deleted files as there won‟t be any backup available in main cloud server. In
www.iosrjournals.org 48 | Page
Securing Cloud Data Storage

some cases, the user may want to increase the size of his stored data in file by adding data at the end of the data
file, which we refer as data append operation. So in case of append operation whenever user append data to his
file, new verification tokens are calculated & stored on main cloud server & file is splitted as before and
dispersed among the cloud storage servers. Previous schemes [5, 6, 7] don‟t support insert operation on data. In
our scheme user can insert data at any location wherever he desires. In case of insert operation, we are treating
as a part of update operation and we are relaying on update operation for insert operation.

III. CONCLUSION
We have analyzed the data security concerns in cloud data storage, which is a distributed storage
system. We proposed a distributed scheme to ensure users that their data are indeed stored appropriately and
kept intact all the time in the cloud. To provide redundancy we used erasure correcting code in the file
distribution preparation. As we all know cloud is not just a third party data warehouse. So providing support for
dynamic operations is very important. Our scheme maintains the same level of storage correctness assurance
even if users modify, delete or append their data files in the cloud. Challenge response protocol along with pre-
computed token is used to verify the storage correctness of user‟s data & to effectively locate the malfunctioning
server when data corruption has been detected. Through detailed performance analysis, we show that our
scheme is having very low communication overhead & guarantees to detect every single unauthorized data
modification. Our scheme has no limitation on number of pre-computed tokens used for challenging the cloud
servers. Unlimited number of challenges can be made. But we still believe that data storage security in Cloud
computing is an area full of challenges and of paramount importance.

REFERENCES
[1] A. Juels, J. Burton, and S. Kaliski, “PORs: Proofs of Retrievability for Large Files,” Proc. ACM CCS „07, Oct. 2007, pp. 584–97.
[2] G.Ateniese, R. Burns, R. Curtmola, J. Herring, L. Kissner, Z. Peterson, and D. Song, “Provable Data P ossession at Untrusted
Stores,” Proc. ACM CCS „07, Oct. 2007, pp. 598–609.
[3] G. Ateniese, R. D. Pietro, L. V. Mancini, and G. Tsudik, “Scalable and Efficient Provable Data Possession,” Proc. SecureComm „08,
Sept. 2008.
[4] H. Shacham and B. Waters, “Compact Proofs of Retrievability,” Proc. Asia-Crypt „08, LNCS, vol. 5350, Dec. 2008, pp. 90–107.
[5] K. D. Bowers, A. Juels, and A. Oprea, “Hail: A High-Availability and Integrity Layer for Cloud Storage,” Proc. ACM CCS „09,
Nov. 2009, pp. 187–98.
[6] C.Wang, Qian Wang, Kui Ren, Wenjing Lou, “Ensuring Data Storage Security in Cloud Computing,” Proc. IWQoS „09, July 2009,
pp. 1–9.
[7] Q. Wang, C.Wang, Wenjing Lou, Jin Li, “Enabling Public Verifiability and Data Dynamics for Storage Security in Cloud
Computing,” Proc. ESORICS „09, Sept. 2009, pp. 355–70.
[8] C. Erway, Alptekin, Charalampos Papamanthou, Roberto Tamassia, “Dynamic Provable Data Possession,” Proc. ACM CCS „09,
Nov. 2009, pp. 213–22.
[9] R. Curtmola, O. Khan, R. Burns, and G. Ateniese, “MR-PDP: Multiple-replica provable data possession,” in Proc. of ICDCS‟08.
IEEE Computer Society, 2008, pp. 411–420.
[10] T. Schwarz and E. L. Miller, “Store, forget, and check: Using algebraic signatures to check remotely administered storage,” i n Proc.
of ICDCS‟06, 2006.
[11] N. Gohring, “Amazon‟s S3 down for several hours,” Online at http://www.pcworld.com/businesscenter/article/142549/,
amazons_s3_down_for_several hours.html, 2008.
[12] M. Arrington, “Gmail Disaster: Reports of Mass Email Deletions,” Dec. 2006; http://www.techcrunch.com/2006/12/28/gmail -
disaster-reports-of-massemail-deletions/
[13] Peter Mell, Tim Grance, “The NIST Definition of Cloud Computing”, Online at http://www.nist.gov/itl/cloud/upload-def-v15.pdf.
[14] K. D. Bowers, A. Juels, and A. Oprea, “Proofs of Retrievability: Theory and Implementation,” Cryptology ePrint Archive, Report
2008/175, 2008, http://eprint.iacr.org/.
[15] M. Lillibridge, S. Elnikety, A. Birrell, M. Burrows and M. Isard, “A Cooperative Internet Backup Scheme”, Proc. of the 2003,
USENIX Annual Technical Conference (General Track), pp. 29–41, 2003.
[16] Qian Wang, Cong Wang, Kui Ren, Wenjing Lou, and Jin Li, “Enabling Public Auditability and Data Dynamics for Storage Security
in Cloud Computing” , IEEE Transactions on Parallel & Distributed Systems, Volume: 22, Issue: 5, pages: 847 -859.
[17] C. Wang, Q. Wang, K. Ren, and W. Lou, “Privacy-preserving public auditing for storage security in cloud computing”, in Proc. of
IEEE INFOCOM’10, San Diego, CA, USA, March 2010.
[18] J. S. Plank and Y. Ding, “Note: Correction to the 1997 Tutorial on Reed-Solomon Coding,” University of Tennessee, Tech. Rep. CS-
03- 504, 2003.

www.iosrjournals.org 49 | Page

You might also like