Big Data De-duplication using modified SHA algorithm in cloud servers for optimal capacity utilization and reduced transmission bandwidth

Authors

  • Rajendran Bhojan Department of Mathematics and Computer Science, The Papua New Guinea University of Technology, India Author
  • Manikandan Rajagopal Lean Operations and Systems, School of Business and Management, CHRIST (Deemed to be University), Bangalore, India Author
  • Ramesh R Department of Computer Science, KPR College of Arts Science and Research, Tamilnadu, India Author

DOI:

https://doi.org/10.56294/dm2024245

Keywords:

Preprocessing, De-duplication, SHA, Cloud Servers, Target

Abstract

Data de-duplication in cloud storage is crucial for optimizing resource utilization and reducing transmission overhead. By eliminating redundant copies of data, it enhances storage efficiency, lowers costs, and minimizes network bandwidth requirements, thereby improving overall performance and scalability of cloud-based systems. The research investigates the critical intersection of data de-duplication (DD) and privacy concerns within cloud storage services. Distributed Data (DD), a widely employed technique in these services and aims to enhance capacity utilization and reduce transmission bandwidth. However, it poses challenges to information privacy, typically addressed through encoding mechanisms. One significant approach to mitigating this conflict is hierarchical approved de-duplication, which empowers cloud users to conduct privilege-based duplicate checks before data upload. This hierarchical structure allows cloud servers to profile users based on their privileges, enabling more nuanced control over data management. In this research, we introduce the SHA method for de-duplication within cloud servers, supplemented by a secure pre-processing assessment. The proposed method accommodates dynamic privilege modifications, providing flexibility and adaptability to evolving user needs and access levels. Extensive theoretical analysis and simulated investigations validate the efficacy and security of the proposed system. By leveraging the SHA algorithm and incorporating robust pre-processing techniques, our approach not only enhances efficiency in data de-duplication but also addresses crucial privacy concerns inherent in cloud storage environments. This research contributes to advancing the understanding and implementation of efficient and secure data management practices within cloud infrastructures, with implications for a wide range of applications and industries

References

1. Gurler CG, Savas SS, Tekalp AM. Variable chunk size and adaptive scheduling window for P2P streaming of scalable video. In: 2012 19th IEEE International Conference on Image Processing; IEEE; 2012.

2. Shen H, Li J. A DHT-Aided Chunk-Driven Overlay for Scalable and Efficient Peer-to-Peer Live Streaming. IEEE Transactions on Parallel and Distributed Systems. 2013 Nov;24(11):22 Oct 2012.

3. Bhagwat D, Eshghi K, Long DDE, Lillibridge M. Extreme Binning: Scalable, Parallel Deduplication for Chunk-based File Backup. In: 2009 IEEE International Symposium on Modeling, Analysis & Simulation of Computer and Telecommunication Systems; IEEE; 2009.

4. Lin CH, Lee CY, Yeh YS, Chien HS, Chien SP. Generalized Secure Hash Algorithm: SHA-X. In: 2011 IEEE EUROCON - International Conference on Computer as a Tool; IEEE; 2011.

5. Lee SH, Shin KW. An Efficient Implementation of SHA processor Including Three Hash Algorithms (SHA-512, SHA-512/224, SHA-512/256). In: 2018 International Conference on Electronics, Information, and Communication (ICEIC); IEEE; 2018.

6. Mosquera ASB, Román-Mireles A, Rodríguez-Álvarez AM, Esmeraldas E del CO, Nieves-Lizárraga DO, Velarde-Osuna DV, et al. Gamification and development of social skills in education. AG Salud 2024;2:58-58. https://doi.org/10.62486/agsalud202458.

7. Ahmad I, Das AS. Analysis and Detection Of Errors In Implementation Of SHA-512 Algorithms On FPGAs. The Computer Journal. 2007 Nov;50(6).

8. Kunhu A, Al-Ahmad H, Taher F. Medical Images Protection and Authentication using hybrid DWT-DCT and SHA256-MD5 Hash Functions. In: 2017 24th IEEE International Conference on Electronics, Circuits and Systems (ICECS); IEEE; 2017.

9. Aziz MVG, Wijaya R, Prihatmanto AS, Henriyan D. HASH MD5 Function Implementation at 8-bit Microcontroller. In: 2013 Joint International Conference on Rural Information & Communication Technology and Electric-Vehicle Technology (rICT & ICeV-T); IEEE; 2013.

10. Solano AVC, Arboleda LDC, García CCC, Dominguez CDC. Benefits of artificial intelligence in companies. AG Managment 2023;1:17-17. https://doi.org/10.62486/agma202317.

11. Sediyono E, Santoso KI, Suhartono. Secure Login by Using One-time Password Authentication Based on MD5 Hash Encrypted SMS. In: 2013 International Conference on Advances in Computing, Communications and Informatics (ICACCI); IEEE; 2013.

12. Wua H, Liua X, Tang W. A Fast GPU-based Implementation for MD5 Hash Reverse. In: 2011 IEEE International Conference on Anti-Counterfeiting, Security and Identification; IEEE; 2011.

13. Gonzalez-Argote J, Castillo-González W. Update on the use of gamified educational resources in the development of cognitive skills. AG Salud 2024;2:41-41. https://doi.org/10.62486/agsalud202441.

14. Kim WB, Lee IY, Ryou JC. Improving dynamic ownership scheme for data Deduplication. In: 2017 4th International Conference on Computer Applications and Information Processing Technology (CAIPT); IEEE; 2017.

15. Bhalerao A, Pawar A. A Survey: On Data Deduplication for Efficiently Utilizing Cloud Storage for Big Data Backups. In: 2017 International Conference on Trends in Electronics and Informatics (ICEI); IEEE; 2017.

Downloads

Published

2024-03-30

Issue

Section

Original

How to Cite

1.
Bhojan R, Rajagopal M, Ramesh R. Big Data De-duplication using modified SHA algorithm in cloud servers for optimal capacity utilization and reduced transmission bandwidth. Data and Metadata [Internet]. 2024 Mar. 30 [cited 2024 Sep. 16];3:245. Available from: https://dm.ageditor.ar/index.php/dm/article/view/250