Deduplication On Encrypted Data in Cloud Computing
Deduplication On Encrypted Data in Cloud Computing
Deduplication on Encrypted
Data in Cloud Computing
Aditya Tryambak Sambare1; Prathamesh Hanmant Shingate2; Amol Kishor Shelke3
Mansi Ranjit Thakur 4; G Nazia Sulthana5 (Professor)
Department of Computer Engineering Mahatma Gandhi Mission’s College of Engineering and Technology,
Navi Mumbai, Maharashtra
Abstract:- Cloud storage is a crucial component of cloud an open subject. Duplicated data can be encrypted and stored
computing, allowing users to expand their storage without in the cloud by multiple users across different CSPs. Data
upgrading their equipment and overcome resource deduplication and access control are supposed to be
constraints. Cloud users' data is always encrypted before compatible. The same data, whether encrypted or not, is stored
being outsourced to ensure their security and privacy. once in the cloud and can be accessed by multiple people based
However, encrypted data may result in a significant waste on the policies of data owners or holders. Duplicate data in
of cloud resources. Storage complicates data sharing cloud storage can waste network resources, burn energy,
among authorized users. We continue to face issues with increase prices, and complicate data management. Economic
encrypted data storage and deduplication. Traditional storage benefits both CSPs and cloud consumers by lowering
deduplication strategies are designed for certain operating expenses and service prices. Cloud data
application settings where data owners or cloud servers deduplication is crucial for storing and managing large
have full control over the process. They cannot meet data amounts of data. However, there are few research on flexible
owners' varying requests based on data sensitivity. This cloud data deduplication across several CSPs. Existing
study proposes a flexible data storage management solutions lack flexibility and uniformity in supporting both
method that combines deduplication and access control deduplication and access control in the cloud. This work
across various Clouds. Service Providers (CSP). We assess proposes a heterogeneous data storage management method to
its performance through security analyses, comparisons, address the issues mentioned above. The proposed approach is
and implementations. The results demonstrate security, compatible with the access control scheme proposed
effectiveness, and efficiency for actual applications. previously. It allows for flexible cloud storage management,
including data deduplication and access control, which can be
Keywords:- Data Deduplication, Cloud Computing, Access managed by the data owner, a trusted third party, or neither.
Control, Storage Management. The suggested technique addresses data security concerns
while also saving storage space through deduplication across
I. INTRODUCTION many CSPs. Thus, it can be used in a variety of data storage
applications. Our scheme is unique and distinct from prior
Cloud computing provides centralized data storage and work. This study proposes using encryption and deduplication
online access to computer services or resources. This new to conserve cloud storage across several CSPs while
approach to IT services reorganizes resources and tailors them maintaining data security and privacy in different scenarios.
to meet user needs. Cloud computing offers numerous Our proposed heterogeneous data management scheme
benefits, including scalability, elasticity, fault tolerance, and supports deduplication and access control based on data
pay-per-use. Cloud storage allows users to store large amounts owners' needs, adapting to various application scenarios. Our
of data without the need for gadget upgrades and access it method allows for flexible data exchange among eligible
anytime, anywhere. Cloud data storage provided by Cloud users, governed by data owners, trusted parties, or both. The
Service Providers (CSPs) is not without issues. Data stored in suggested scheme's performance is validated by security
the cloud may require varying levels of protection based on its analysis, comparison to current work, and implementation-
sensitivity. The cloud stores sensitive personal information, based evaluation. The results demonstrate security, benefits,
publicly shared data, and group-shared data. Important data efficiency, and possible use.
should be securely stored in the cloud to avoid illegal access.
Unimportant data may not be subject to such requirements. II. EXISTING SYSTEM
Outsourced data may contain sensitive information, so data
owners may prefer to control it themselves or delegate control Yang et al. presented the Provable Ownership of the File
to a third party if they are unavailable or unsure how to do so. (POF) approach, which enables users to establish their
Adapting cloud data access control to varied scenarios and ownership of a file without uploading the complete file to the
user needs is a practical issue. Access control for encrypted server. Data ownership evidence is an important part of data
data has been extensively researched in the literature. Few deduplication, particularly for encrypted data. However, this
cloud data protection solutions can meet diverse needs technique does not provide for flexible deduplication control
uniformly, particularly when it comes to cheap deduplication. across many CSPs.
Flexible cloud data deduplication with access control remains
Yan et al. presented a PRE-based deduplication strategy handles certain obstacles, while UBLDE protocol efficiently
that relied solely on authorized parties to govern data handles others. The difficulty of dynamic ownership
deduplication. It is unable to adapt to many conditions, management is met here.
particularly the data access regulated by the data proprietors.
Another sentence from our earlier work. The authors of [7] propose a method to lower the expense
of data updates. The user cannot update encrypted data in an
Disadvantage- efficient or secure manner using the current MLE solution. A
Disadvantages of the current method include little single piece of data update comes at a hefty expense. Thus, the
research on flexible cloud data deduplication across several authors have presented message-locked encryption that is
CSPs. Existing solutions lack flexibility and uniformity in updateable at the block level. method that seeks to lower the
supporting deduplication and access control in the cloud. logarithm of computing cost to file size. Additionally, it now
requires confirmation of ownership for users to access files.
III. LITERATUE SURVEY
In order to enable allowed data duplication, the author of
The SRRS system was presented by the authors in [1]. It [8] presents a strategy that makes use of the symmetric
uses a role re-encryption algorithm to effectively accomplish encryption algorithm, hashing technique, convergent
approved data deduplication and a convergent algorithm to encryption algorithm, and token generation scheme. Here, the
maintain data confidentiality. To manage keys and user roles, a security and confidentiality of user data are upheld. Both
management center is introduced. On the client side, passive and active attacks are prevented on the data.
computational cost and overhead are decreased with the
addition of the management center to the system. The SRRS To facilitate dynamic ownership management, writers in
system decreases bandwidth usage and storage space [9] presented PoW (Proof-of-ownership) with data
requirements by performing data deduplication. deduplication. Data deduplication at the file, user, and block
levels is supported by this system. This plan successfully
A unique attribute-based storage system that facilitates protects data confidentiality and performs secure
safe and effective deduplication has been proposed by the deduplication. uniformity. It also lessens the need for storage
authors in [2]. Additionally, it discussed the flaw in the space and key management. The author of [10] has reviewed
common attribute-based encryption method, which is its numerous approaches and technological advancements for
inability to provide secure deduplication. The system operates putting data deduplication into practice. They've also included
in a hybrid cloud setting, with the public cloud handling a comparison of different technology. The study illustrates
storage and the private cloud handling the identification of how conducting data deduplication compromises data
identical copies. There are two main benefits to the system: confidentiality to varying degrees.
Data sharing is done while maintaining data confidentiality To facilitate dynamic ownership management, PoW
by defining an access policy. (Proof-of-ownership) with data deduplication has been
Here, high standard theory is used to achieve the concept of presented by the authors in [11]. Block-level, cross-user, and
data security, while others were unable to carry things out file-level data deduplication are all supported by this system.
in accordance with this philosophy This plan successfully maintains data and does secure
deduplication, secrecy and regularity. It also lessens the
The author of [3] described the ABE (Attribute Based workload for storage and key management.
Encryption) technology, which is utilized to effectively transfer
data and minimize storage space. In this method, the user is IV. PROBLEM STATEMENT
granted the ability to calculate and decode the encrypted data if
their attributes match.
process allows data holders to retrieve the plain content of 𝐶𝑇𝑢 REFERENCES
stored at CSP.
[1]. R. Chow, P. Golle, M. Jakobsson, E. Shi, J. Staddon,
Symmetric Key Management- R. Masuoka, and J. Molina, "Controlling data in the
This approach generates partial keys (e.g., 𝑷𝐸𝑾1, 𝑢 and cloud: outsourcing computation without outsourcing
𝑷𝑸𝑾2, 𝑢) from input 𝑷𝐸𝑾𝑢 using random separation. If control," in Proc. 2009 ACM Workshop Cloud
necessary, 𝐷𝐸𝐾𝑢 can be separated into various pieces. Comput. Secur., pp. 85-90, 2009.
[2]. S. Kamara, and K. Lauter, "Cryptographic cloud
𝑪𝒐𝒎𝒃𝒊𝒏𝒆𝑲𝒆𝒚(𝑫𝑬𝑲𝟏,𝒖, 𝑫𝑬𝑲𝟐,𝒖). This algorithm storage," Financ. Crypto. Data Secur., pp. 136-149,
combines partial keys of 𝑷𝐸𝐾𝑢, such as 𝑷𝐸𝑾1, 𝑢 and 𝑷𝑸𝑾2, Springer, 2010.
𝑢, to produce the full key 𝑷𝑸𝑾𝑢. [3]. Q. Liu, C. C. Tan, J. Wu, and G. Wang, "Efficient
information retrieval for ranked queries in cost-
Partial Key Control based on ABE Operated by the Data effective cloud environments," in Proc. 2012 IEEE
Owner – INFOCOM, pp. 2581-2585, 2012.
𝑬𝒏𝒄𝒓𝒚𝒑𝒕𝑲𝒆𝒚 (𝑫𝑬𝑲𝟐, 𝒖, 𝝀, 𝒑𝒌𝑰𝑫, 𝒖) encrypts 𝐷𝐸𝐾2, [4]. M. Kallahalla, E. Riedel, R. Swaminathan, Q. Wang,
𝑢 with policy 𝜆 and outputs cipher-key 𝐗. This algorithm is and K. Fu, "Plutus: scalable secure file sharing on
executed at 𝑢. untrusted storage," in Proc. USENIX Conf. File
Storage Technol., pp. 29–42, 2003.
𝑫𝒆𝒄𝒓𝒚𝒑𝒕𝑲𝒆𝒚 (𝑪𝑲𝟐, 𝒖, 𝝀, 𝑺𝑲𝒖′, 𝒔𝒌𝑰𝑫, 𝒖, 𝒖') decrypts [5]. E.-J. Goh, H. Shacham, N. Modadugu, and D. Boneh,
cipher key 𝐶𝐾2, 𝑢 and outputs 𝐷𝐸𝐾2, 𝑢. The algorithm is "SiRiUS: securing remote untrusted storage," in Proc.
executed at 𝑢′. Netw. Distrib. Syst. Secur. Symp., pp. 131-145, 2003.
[6]. J. Bethencourt, A. Sahai, and B. Waters, "Ciphertext-
Partial Key Control with PRE Operated by AP. We use policy attribute-based encryption," in Proc. of IEEE
PRE to enable AP to re-encrypt 𝐶1. During cipher text re- Symp. Secur. Privacy (SP'07), pp. 321-334, 2007.
encryption, CSP does not learn about 𝐷𝐸𝐾1. The PRE [7]. V. Goyal, O. Pandey, A. Sahai, and B. Waters,
algorithms are represented as follows: The function 𝑬 (𝒑𝒌𝑨𝑷, “Attribute-based encryption for fine-grained access
𝑫𝑬𝑲𝟏, 𝒖) generates 𝐶𝑾1 = 𝐸 (𝑝𝑘𝑴𝑃, 𝑷𝑸𝑾1, 𝑢) by taking control of encrypted data”, in Proc. of 13th ACM
𝑝𝑘𝑴𝑃 and 𝑷𝐸𝑾1, 𝑢 as input. 𝑹𝑮 (𝒑𝒌𝑨𝑷, 𝒔𝒌𝑨𝑷, 𝒑𝒌𝒖') Comput. Commun. Secur., pp. 89–98, 2006.
outputs re-encryption key 𝑟𝑘𝐴𝑃→𝑢' for the proxy CSP by [8]. S. Muller, S. Katzenbeisser, and C. Eckert,
taking 𝑝𝑘𝐴𝑃, 𝑠𝑘𝐴𝑃, and 𝑝𝑘𝑢' as input. 𝑹 (𝒓𝒌𝑨𝑷→𝒖', 𝑪𝑲𝟏) “Distributed attribute-based encryption,” in Proc. of
takes input 𝑟𝑘𝐴𝑃→𝑢' and 𝐶𝐾1, and outputs 𝑅 (𝑟𝑘𝐴𝑃→𝑢', 11th Annual Int. Conf. Inf. Secur. Crypto., pp. 20–36,
2008.
𝐶𝐾1) = 𝐸 (𝑝𝑘𝑢', 𝐷𝐸𝐾1, 𝑢) = 𝐶𝐾′1, which can be decrypted
[9]. A. Sahai, and B. Waters, “Fuzzy identity-based
with 𝑠𝑘𝑢'. The function 𝑫 (𝒔𝒌𝒖, 𝑪𝑲'𝟏) generates 𝑷𝐸𝑾1, 𝑢
encryption,” in Proc. of 24th Int. Conf. Theory App.
from the inputs 𝑠𝑘𝑢 and 𝑶𝑾'1.
Cryptographic Tech., pp. 457– 473, 2005.
[10]. S. C. Yu, C. Wang, K. Ren, and W. J. Lou, “Achieving
VII. CONCLUSION
secure, scalable, and fine-grained data access control in
cloud computing,” in Proc. of IEEE INFOCOM, pp.
Data deduplication plays a crucial role in cloud storage, 534–542, 2010.
particularly for huge data. management. This work proposes a
[11]. G. J. Wang, Q. Liu, J. Wu, and M. Y. Guo,
heterogeneous data storage management method with
“Hierarchical attribute-based encryption and scalable
customizable cloud data deduplication and access control. Our user revocation for sharing data in cloud servers,”
scheme provides cost-effective big data storage across Comput. Secur., vol. 30, no. 5, pp. 320–331, 2011.
numerous CSPs, adapting to different application scenarios
and demands. It supports data deduplication and access control
with varying security needs. Our security analysis, comparison
to prior work, and performance evaluation demonstrated that
our scheme is secure, sophisticated, and efficient. Our
approach protects user privacy by storing encrypted data on
the cloud. Using pseudonyms can help protect identify
privacy. The Key Generation Center (KGC) verifies and
certifies the relationship between a genuine identity and a
pseudonym. Our future effort is to strengthen user privacy and
improve our system for actual deployment. We will analyze
the suggested method using game theory to ensure its security
and rationality.