0% found this document useful (0 votes)
77 views

Deduplication On Encrypted Data in Cloud Computing

Cloud storage is a crucial component of cloud computing, allowing users to expand their storage without upgrading their equipment and overcome resource constraints. Cloud users' data is always encrypted before being outsourced to ensure their security and privacy. However, encrypted data may result in a significant waste of cloud resources. Storage complicates data sharing among authorized users. We continue to face issues with encrypted data storage and deduplication.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
77 views

Deduplication On Encrypted Data in Cloud Computing

Cloud storage is a crucial component of cloud computing, allowing users to expand their storage without upgrading their equipment and overcome resource constraints. Cloud users' data is always encrypted before being outsourced to ensure their security and privacy. However, encrypted data may result in a significant waste of cloud resources. Storage complicates data sharing among authorized users. We continue to face issues with encrypted data storage and deduplication.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

Volume 9, Issue 4, April – 2024 International Journal of Innovative Science and Research Technology

ISSN No:-2456-2165 https://doi.org/10.38124/ijisrt/IJISRT24APR1394

Deduplication on Encrypted
Data in Cloud Computing
Aditya Tryambak Sambare1; Prathamesh Hanmant Shingate2; Amol Kishor Shelke3
Mansi Ranjit Thakur 4; G Nazia Sulthana5 (Professor)
Department of Computer Engineering Mahatma Gandhi Mission’s College of Engineering and Technology,
Navi Mumbai, Maharashtra

Abstract:- Cloud storage is a crucial component of cloud an open subject. Duplicated data can be encrypted and stored
computing, allowing users to expand their storage without in the cloud by multiple users across different CSPs. Data
upgrading their equipment and overcome resource deduplication and access control are supposed to be
constraints. Cloud users' data is always encrypted before compatible. The same data, whether encrypted or not, is stored
being outsourced to ensure their security and privacy. once in the cloud and can be accessed by multiple people based
However, encrypted data may result in a significant waste on the policies of data owners or holders. Duplicate data in
of cloud resources. Storage complicates data sharing cloud storage can waste network resources, burn energy,
among authorized users. We continue to face issues with increase prices, and complicate data management. Economic
encrypted data storage and deduplication. Traditional storage benefits both CSPs and cloud consumers by lowering
deduplication strategies are designed for certain operating expenses and service prices. Cloud data
application settings where data owners or cloud servers deduplication is crucial for storing and managing large
have full control over the process. They cannot meet data amounts of data. However, there are few research on flexible
owners' varying requests based on data sensitivity. This cloud data deduplication across several CSPs. Existing
study proposes a flexible data storage management solutions lack flexibility and uniformity in supporting both
method that combines deduplication and access control deduplication and access control in the cloud. This work
across various Clouds. Service Providers (CSP). We assess proposes a heterogeneous data storage management method to
its performance through security analyses, comparisons, address the issues mentioned above. The proposed approach is
and implementations. The results demonstrate security, compatible with the access control scheme proposed
effectiveness, and efficiency for actual applications. previously. It allows for flexible cloud storage management,
including data deduplication and access control, which can be
Keywords:- Data Deduplication, Cloud Computing, Access managed by the data owner, a trusted third party, or neither.
Control, Storage Management. The suggested technique addresses data security concerns
while also saving storage space through deduplication across
I. INTRODUCTION many CSPs. Thus, it can be used in a variety of data storage
applications. Our scheme is unique and distinct from prior
Cloud computing provides centralized data storage and work. This study proposes using encryption and deduplication
online access to computer services or resources. This new to conserve cloud storage across several CSPs while
approach to IT services reorganizes resources and tailors them maintaining data security and privacy in different scenarios.
to meet user needs. Cloud computing offers numerous Our proposed heterogeneous data management scheme
benefits, including scalability, elasticity, fault tolerance, and supports deduplication and access control based on data
pay-per-use. Cloud storage allows users to store large amounts owners' needs, adapting to various application scenarios. Our
of data without the need for gadget upgrades and access it method allows for flexible data exchange among eligible
anytime, anywhere. Cloud data storage provided by Cloud users, governed by data owners, trusted parties, or both. The
Service Providers (CSPs) is not without issues. Data stored in suggested scheme's performance is validated by security
the cloud may require varying levels of protection based on its analysis, comparison to current work, and implementation-
sensitivity. The cloud stores sensitive personal information, based evaluation. The results demonstrate security, benefits,
publicly shared data, and group-shared data. Important data efficiency, and possible use.
should be securely stored in the cloud to avoid illegal access.
Unimportant data may not be subject to such requirements. II. EXISTING SYSTEM
Outsourced data may contain sensitive information, so data
owners may prefer to control it themselves or delegate control Yang et al. presented the Provable Ownership of the File
to a third party if they are unavailable or unsure how to do so. (POF) approach, which enables users to establish their
Adapting cloud data access control to varied scenarios and ownership of a file without uploading the complete file to the
user needs is a practical issue. Access control for encrypted server. Data ownership evidence is an important part of data
data has been extensively researched in the literature. Few deduplication, particularly for encrypted data. However, this
cloud data protection solutions can meet diverse needs technique does not provide for flexible deduplication control
uniformly, particularly when it comes to cheap deduplication. across many CSPs.
Flexible cloud data deduplication with access control remains

IJISRT24APR1394 www.ijisrt.com 3370


Volume 9, Issue 4, April – 2024 International Journal of Innovative Science and Research Technology
ISSN No:-2456-2165 https://doi.org/10.38124/ijisrt/IJISRT24APR1394

Yan et al. presented a PRE-based deduplication strategy handles certain obstacles, while UBLDE protocol efficiently
that relied solely on authorized parties to govern data handles others. The difficulty of dynamic ownership
deduplication. It is unable to adapt to many conditions, management is met here.
particularly the data access regulated by the data proprietors.
Another sentence from our earlier work. The authors of [7] propose a method to lower the expense
of data updates. The user cannot update encrypted data in an
 Disadvantage- efficient or secure manner using the current MLE solution. A
Disadvantages of the current method include little single piece of data update comes at a hefty expense. Thus, the
research on flexible cloud data deduplication across several authors have presented message-locked encryption that is
CSPs. Existing solutions lack flexibility and uniformity in updateable at the block level. method that seeks to lower the
supporting deduplication and access control in the cloud. logarithm of computing cost to file size. Additionally, it now
requires confirmation of ownership for users to access files.
III. LITERATUE SURVEY
In order to enable allowed data duplication, the author of
The SRRS system was presented by the authors in [1]. It [8] presents a strategy that makes use of the symmetric
uses a role re-encryption algorithm to effectively accomplish encryption algorithm, hashing technique, convergent
approved data deduplication and a convergent algorithm to encryption algorithm, and token generation scheme. Here, the
maintain data confidentiality. To manage keys and user roles, a security and confidentiality of user data are upheld. Both
management center is introduced. On the client side, passive and active attacks are prevented on the data.
computational cost and overhead are decreased with the
addition of the management center to the system. The SRRS To facilitate dynamic ownership management, writers in
system decreases bandwidth usage and storage space [9] presented PoW (Proof-of-ownership) with data
requirements by performing data deduplication. deduplication. Data deduplication at the file, user, and block
levels is supported by this system. This plan successfully
A unique attribute-based storage system that facilitates protects data confidentiality and performs secure
safe and effective deduplication has been proposed by the deduplication. uniformity. It also lessens the need for storage
authors in [2]. Additionally, it discussed the flaw in the space and key management. The author of [10] has reviewed
common attribute-based encryption method, which is its numerous approaches and technological advancements for
inability to provide secure deduplication. The system operates putting data deduplication into practice. They've also included
in a hybrid cloud setting, with the public cloud handling a comparison of different technology. The study illustrates
storage and the private cloud handling the identification of how conducting data deduplication compromises data
identical copies. There are two main benefits to the system: confidentiality to varying degrees.

 Data sharing is done while maintaining data confidentiality To facilitate dynamic ownership management, PoW
by defining an access policy. (Proof-of-ownership) with data deduplication has been
 Here, high standard theory is used to achieve the concept of presented by the authors in [11]. Block-level, cross-user, and
data security, while others were unable to carry things out file-level data deduplication are all supported by this system.
in accordance with this philosophy This plan successfully maintains data and does secure
deduplication, secrecy and regularity. It also lessens the
The author of [3] described the ABE (Attribute Based workload for storage and key management.
Encryption) technology, which is utilized to effectively transfer
data and minimize storage space. In this method, the user is IV. PROBLEM STATEMENT
granted the ability to calculate and decode the encrypted data if
their attributes match.

Convergent encryption is a mechanism that authors [4]


devised to secure data throughout the deduplication process.
The data that is outsourced is transformed into encrypted text
prior to deduplication. Additionally, the authors have given the
users access to various rights.

The authors of [5] presented (MLE), which offers secure


deduplication. Large files work best with this method because
it requires server-side schema maintenance. Large files require
better upkeep; thus this plan works well. Both file-level and
block-level deduplication are supported by this method.

Updatable block-level deduplication, which allows for


simple data updating and deduplication on encrypted data, was
presented by the authors in [6]. Here, the problem with file-
level deduplication of efficient data updating is resolved. MLE Fig 1 System and Security

IJISRT24APR1394 www.ijisrt.com 3371


Volume 9, Issue 4, April – 2024 International Journal of Innovative Science and Research Technology
ISSN No:-2456-2165 https://doi.org/10.38124/ijisrt/IJISRT24APR1394

 System and Security – The suggested technique addresses data security


Model Figure depicts the system where the proposed concerns while also saving storage space through
strategy can be applied. It includes four sorts of entities: deduplication across many CSPs. Thus, it can be used in a
variety of data storage applications.
 A trusted Key Generation Center (KGC) is responsible for
generating system parameters and issuing certificates. We justify the proposed scheme's performance by doing
 The Cloud Service Provider (CSP) provides data storage a security analysis, comparing it to prior work, and evaluating
services. There could be multiple CSPs in the system. its implementation. The findings demonstrate its security,
Cloud users can select one of these options to manage their benefits, efficiency, and possible use.
uploaded data and access advanced features. CSPs might
collaborate in a business agreement to save storage space VI. ALGORITHM
through deduplication.
 Data is stored at cloud service providers. Various CSPs A. Algorithm-
may service the data holders. Multiple data holders or a This sub-section introduces key algorithms for the
single cloud user can store encrypted or plain data at one proposed scheme-
or multiple CSPs.
 The Authorized Party (AP) manages data access as a  System Setup
delegate for data owners and supports deduplication. In
this arrangement, all entities trust AP. CSPs cannot be  Initiate Systems-
completely trusted. They are fascinated about cloud users' This algorithm is conducted at KGC. It generates basic
raw data, but rigorously adhere to system design and system settings for ABE and PRE, including generators and
protocols. We assume that the AP will never collude with universal properties. Cloud user 𝑢 generates key pairs based
CSPs because to differing financial incentives and on system parameters, including ABE master key pair 𝑃𝑾𝑢
objectives collusion could harm the reputation of CSPs, and 𝑆𝑾𝑢 for encryption and user decryption, and PKC key
ultimately resulting in loss of business. pair 𝑃𝑾'𝑢 and 𝑆𝑾𝑢 for PKC key issuance.

 Notations-  Setup Node (u)-


𝑃𝐾𝑢 𝑢's public key for ABE, along with their unique user With node identity 𝑢 and public keys as input, this
ID. The key for user attribute verification generates a algorithm executed at KGC outputs a number of user
personalized secret attribute key for 𝑢. 𝑆𝐾𝑢 The secret key for credentials, 𝐶𝑒𝑟𝑡(𝑃𝐾𝑢) , 𝐶𝑒𝑟𝑡(𝑃𝐾'𝑢) and 𝐶𝑒𝑟𝑡(𝑝𝑘𝑢), which
decrypting ABE. The public key of the Public-Key may be confirmed by CSPs and their users. The AP initiates
Cryptosystem (PKC) is used for encryption and signature itself by generating 𝑝𝑘𝑴𝑃 and 𝑠𝑘𝑴𝑃. 𝑝𝑘𝐴𝑃 is transmitted to
verification. 𝑆𝐾'𝑢 is the secret key for PKC decryption and users of CSPs.
signature generation. The symmetric key of 𝑢 is used to
encrypt user data. 𝐷𝐸𝑾1,𝑢 is the partial key 1 of 𝑷𝑸𝑾𝑢.  ABE Key Generation
𝑷𝑸𝑾2,𝑢 is the partial key 2 of 𝑷𝑸𝑾𝑢. The public key of 𝑢
for attribute 𝑼𝑷 is used to encrypt 𝑷𝑸𝐾2, 𝑢.  𝑪𝒓𝒆𝒂𝒕𝒆𝑰𝑫𝑷𝑲(𝑰𝑫, 𝑺𝑲𝒖). This algorithm examines 𝐼𝐷's
policies and returns 𝑝𝑘𝑼𝑷, 𝑢 for user 𝑢 to regulate data
V. PROPOSED SYSTEM deduplication and access.
 𝑰𝒔𝒔𝒖𝒆𝑰𝑫𝑺𝑲(𝑰𝑫, 𝑺𝑲𝒖, 𝑷𝑲𝑢'). This algorithm is
This study proposes a holistic and heterogeneous data conducted by 𝑢 to issue 𝑠𝑘𝐼𝐷, 𝑢, 𝑢' to 𝑢' if the eligibility
storage management scheme to address the issues mentioned checks for 𝑢' is positive.
above. Our goal is to save cloud storage across many CSPs
while maintaining data security and privacy using encrypted Otherwise, it returns 𝑁𝑈𝐿𝐿. User 𝑢 verifies the properties
storage with deduplication in diverse scenarios. Our proposed of 𝑢′. If the policy is met, 𝑢 will provide a secret key to 𝑢′ for
heterogeneous data management approach allows for sharing duplicated data storage and future access. Otherwise,
deduplication and access control based on data owners' needs, it will refuse the request.
adapting to various application scenarios. Our method allows
for flexible data exchange among eligible users, managed by To simplify the presentation, we utilize user
either data owners, trusted parties, or both.  Our scheme is identification as an example attribute, rather than complex
unique and distinct from prior work. It is a general approach attributes.
for realizing encrypted cloud data, deduplication with access
control facilitates cooperation across multiple CSPs. Access control based on user identity requires practice,
as most data access in the cloud relies on user identity.
 Advantages-
Advantages of the proposed system include  Data Encryption and Decryption-
compatibility with access control schemes and flexible cloud 𝑬𝒏𝒄𝒓𝒚𝒑𝒕 (𝑫𝑬𝑲𝒖, 𝑴) encrypts 𝑀 with 𝐷𝐸𝑾𝑢 and
storage management with data deduplication and access outputs ciphertext 𝑶𝑇𝑢 to secure 𝑀 stored at CSP. 𝑫𝒆𝒄𝒓𝒚𝒑𝒕
control, which can be controlled by the data owner, a trusted (𝑫𝑬𝑲𝒖, 𝑪𝑻𝒖) decrypts 𝐶𝑇𝑢 with 𝑷𝐸𝑾𝑢 and returns 𝑀. The
third party, or neither.

IJISRT24APR1394 www.ijisrt.com 3372


Volume 9, Issue 4, April – 2024 International Journal of Innovative Science and Research Technology
ISSN No:-2456-2165 https://doi.org/10.38124/ijisrt/IJISRT24APR1394

process allows data holders to retrieve the plain content of 𝐶𝑇𝑢 REFERENCES
stored at CSP.
[1]. R. Chow, P. Golle, M. Jakobsson, E. Shi, J. Staddon,
 Symmetric Key Management- R. Masuoka, and J. Molina, "Controlling data in the
This approach generates partial keys (e.g., 𝑷𝐸𝑾1, 𝑢 and cloud: outsourcing computation without outsourcing
𝑷𝑸𝑾2, 𝑢) from input 𝑷𝐸𝑾𝑢 using random separation. If control," in Proc. 2009 ACM Workshop Cloud
necessary, 𝐷𝐸𝐾𝑢 can be separated into various pieces. Comput. Secur., pp. 85-90, 2009.
[2]. S. Kamara, and K. Lauter, "Cryptographic cloud
𝑪𝒐𝒎𝒃𝒊𝒏𝒆𝑲𝒆𝒚(𝑫𝑬𝑲𝟏,𝒖, 𝑫𝑬𝑲𝟐,𝒖). This algorithm storage," Financ. Crypto. Data Secur., pp. 136-149,
combines partial keys of 𝑷𝐸𝐾𝑢, such as 𝑷𝐸𝑾1, 𝑢 and 𝑷𝑸𝑾2, Springer, 2010.
𝑢, to produce the full key 𝑷𝑸𝑾𝑢. [3]. Q. Liu, C. C. Tan, J. Wu, and G. Wang, "Efficient
information retrieval for ranked queries in cost-
 Partial Key Control based on ABE Operated by the Data effective cloud environments," in Proc. 2012 IEEE
Owner – INFOCOM, pp. 2581-2585, 2012.
𝑬𝒏𝒄𝒓𝒚𝒑𝒕𝑲𝒆𝒚 (𝑫𝑬𝑲𝟐, 𝒖, 𝝀, 𝒑𝒌𝑰𝑫, 𝒖) encrypts 𝐷𝐸𝐾2, [4]. M. Kallahalla, E. Riedel, R. Swaminathan, Q. Wang,
𝑢 with policy 𝜆 and outputs cipher-key 𝐗. This algorithm is and K. Fu, "Plutus: scalable secure file sharing on
executed at 𝑢. untrusted storage," in Proc. USENIX Conf. File
Storage Technol., pp. 29–42, 2003.
𝑫𝒆𝒄𝒓𝒚𝒑𝒕𝑲𝒆𝒚 (𝑪𝑲𝟐, 𝒖, 𝝀, 𝑺𝑲𝒖′, 𝒔𝒌𝑰𝑫, 𝒖, 𝒖') decrypts [5]. E.-J. Goh, H. Shacham, N. Modadugu, and D. Boneh,
cipher key 𝐶𝐾2, 𝑢 and outputs 𝐷𝐸𝐾2, 𝑢. The algorithm is "SiRiUS: securing remote untrusted storage," in Proc.
executed at 𝑢′. Netw. Distrib. Syst. Secur. Symp., pp. 131-145, 2003.
[6]. J. Bethencourt, A. Sahai, and B. Waters, "Ciphertext-
Partial Key Control with PRE Operated by AP. We use policy attribute-based encryption," in Proc. of IEEE
PRE to enable AP to re-encrypt 𝐶1. During cipher text re- Symp. Secur. Privacy (SP'07), pp. 321-334, 2007.
encryption, CSP does not learn about 𝐷𝐸𝐾1. The PRE [7]. V. Goyal, O. Pandey, A. Sahai, and B. Waters,
algorithms are represented as follows: The function 𝑬 (𝒑𝒌𝑨𝑷, “Attribute-based encryption for fine-grained access
𝑫𝑬𝑲𝟏, 𝒖) generates 𝐶𝑾1 = 𝐸 (𝑝𝑘𝑴𝑃, 𝑷𝑸𝑾1, 𝑢) by taking control of encrypted data”, in Proc. of 13th ACM
𝑝𝑘𝑴𝑃 and 𝑷𝐸𝑾1, 𝑢 as input. 𝑹𝑮 (𝒑𝒌𝑨𝑷, 𝒔𝒌𝑨𝑷, 𝒑𝒌𝒖') Comput. Commun. Secur., pp. 89–98, 2006.
outputs re-encryption key 𝑟𝑘𝐴𝑃→𝑢' for the proxy CSP by [8]. S. Muller, S. Katzenbeisser, and C. Eckert,
taking 𝑝𝑘𝐴𝑃, 𝑠𝑘𝐴𝑃, and 𝑝𝑘𝑢' as input. 𝑹 (𝒓𝒌𝑨𝑷→𝒖', 𝑪𝑲𝟏) “Distributed attribute-based encryption,” in Proc. of
takes input 𝑟𝑘𝐴𝑃→𝑢' and 𝐶𝐾1, and outputs 𝑅 (𝑟𝑘𝐴𝑃→𝑢', 11th Annual Int. Conf. Inf. Secur. Crypto., pp. 20–36,
2008.
𝐶𝐾1) = 𝐸 (𝑝𝑘𝑢', 𝐷𝐸𝐾1, 𝑢) = 𝐶𝐾′1, which can be decrypted
[9]. A. Sahai, and B. Waters, “Fuzzy identity-based
with 𝑠𝑘𝑢'. The function 𝑫 (𝒔𝒌𝒖, 𝑪𝑲'𝟏) generates 𝑷𝐸𝑾1, 𝑢
encryption,” in Proc. of 24th Int. Conf. Theory App.
from the inputs 𝑠𝑘𝑢 and 𝑶𝑾'1.
Cryptographic Tech., pp. 457– 473, 2005.
[10]. S. C. Yu, C. Wang, K. Ren, and W. J. Lou, “Achieving
VII. CONCLUSION
secure, scalable, and fine-grained data access control in
cloud computing,” in Proc. of IEEE INFOCOM, pp.
Data deduplication plays a crucial role in cloud storage, 534–542, 2010.
particularly for huge data. management. This work proposes a
[11]. G. J. Wang, Q. Liu, J. Wu, and M. Y. Guo,
heterogeneous data storage management method with
“Hierarchical attribute-based encryption and scalable
customizable cloud data deduplication and access control. Our user revocation for sharing data in cloud servers,”
scheme provides cost-effective big data storage across Comput. Secur., vol. 30, no. 5, pp. 320–331, 2011.
numerous CSPs, adapting to different application scenarios
and demands. It supports data deduplication and access control
with varying security needs. Our security analysis, comparison
to prior work, and performance evaluation demonstrated that
our scheme is secure, sophisticated, and efficient. Our
approach protects user privacy by storing encrypted data on
the cloud. Using pseudonyms can help protect identify
privacy. The Key Generation Center (KGC) verifies and
certifies the relationship between a genuine identity and a
pseudonym. Our future effort is to strengthen user privacy and
improve our system for actual deployment. We will analyze
the suggested method using game theory to ensure its security
and rationality.

IJISRT24APR1394 www.ijisrt.com 3373

You might also like