Phishing Website Detection: A Dataset-Centric Approach for Enhanced Security
DOI:
https://doi.org/10.56294/dm2024.223Keywords:
Machine learning, Phishing attacks, CybersecurityAbstract
Introduction; Phishing involves cybercriminals creating fake websites that appear to be real sites with the aim of obtaining personal information. With the increasing sophistication of phishing websites, machine learning today provides a useful approach to scan and counter such attacks.
Objective; In this study, we seek to apply machine learning algorithms on the dataset - Phishing_Legitimate_full.csv – which consists of phishing websites and genuine websites that have been labeled.
Method; This paper aims to identify the most effective feature selection method for predicting phishing websites.
Result; The findings highlight the potential of machine learning in enhancing cybersecurity by automating threat detection and intelligence. Phishing attacks rely on social engineering strategies to present deceptive links as trustworthy sources, deceiving individuals into sharing confidential data.
Conclusion; This study explores the utilization of curated datasets and machine learning algorithms to develop adaptive and efficient phishing detection mechanisms, providing a robust defense against such malicious activities
References
1. Wu L, Du X, Wu J. Effective defense schemes for phishing attacks on mobile computing platforms. IEEE Trans Veh Technol. 2015;65(8):6678–91.
2. Ahmad S, Jha S, Alam A, Alharbi M, Nazeer J. Analysis of Intrusion Detection Approaches for Network Traffic Anomalies with Comparative Analysis on Botnets (2008–2020). Secur Commun Networks. 2022;2022.
3. Anupam S, Kar AK. Phishing website detection using support vector machines and nature-inspired optimization algorithms. Telecommun Syst. 2021;76(1):17–32.
4. Xiang G, Hong JI. A hybrid phish detection approach by identity discovery and keywords retrieval. In: Proceedings of the 18th international conference on World wide web. 2009. p. 571–80.
5. Ahmad S, Afzal MM. A Study and Survey of Security and Privacy issues in Cloud Computing. Int J Eng Res Technol (IJERT), ISSN. :181–2278.
6. Haque MA, Ahmad S, Haque S, Kumar K, Mishra K, Mishra BK. Analyzing University Students’ Awareness of Cybersecurity. In: 2023 International Conference on Emerging Trends in Networks and Computer Communications (ETNCC). IEEE; 2023. p. 250–7.
7. Whig V, Othman B, Gehlot A, Haque MA, Qamar S, Singh J. An Empirical Analysis of Artificial Intelligence (AI) as a Growth Engine for the Healthcare Sector. In: 2022 2nd International Conference on Advance Computing and Innovative Technologies in Engineering (ICACITE). IEEE; 2022. p. 2454–7.
8. Haakenstad A, Irvine CMS, Knight M, Bintz C, Aravkin AY, Zheng P, et al. Measuring the availability of human resources for health and its relationship to universal health coverage for 204 countries and territories from 1990 to 2019: a systematic analysis for the Global Burden of Disease Study 2019. Lancet. 2022;399(10341):2129–54.
9. Hossain MA, Haque MA, Ahmad S, Abdeljaber HAM, Eljialy AEM, Alanazi A, et al. AI-enabled approach for enhancing obfuscated malware detection: a hybrid ensemble learning with combined feature selection techniques. Int J Syst Assur Eng Manag [Internet]. 2024; Available from: https://doi.org/10.1007/s13198-024-02294-y
10. Haque MA, Ahmad S, Abboud AJ, Hossain MA, Kumar K, Haque S, et al. 6G wireless Communication Networks: Challenges and Potential Solution. Int J Bus Data Commun Netw. 2024;19(1):1–27.
11. Haque MA, Haque S, Kumar K, Singh NK. A Comprehensive Study of Cyber Security Attacks, Classification, and Countermeasures in the Internet of Things. In: Digital Transformation and Challenges to Data Security and Privacy. IGI Global; 2021. p. 63–90.
12. Haque MA, Ahmad S, Sonal D, Abdeljaber HAM, Mishra BK, Eljialy AEM, et al. Achieving Organizational Effectiveness through Machine Learning Based Approaches for Malware Analysis and Detection. Data Metadata. 2023;2:139.
13. Haque MA, Ahmad S, John A, Mishra K, Mishra BK, Kumar K, et al. Cybersecurity in Universities: An Evaluation Model. SN Comput Sci [Internet]. 2023;4(5):569. Available from: https://doi.org/10.1007/s42979-023-01984-x
14. Haque A, Raza S, Ahmad S, Hossain A, Abdeljaber HAM, Eljialy AEM, et al. Implication of Different Data Split Ratio on the Performance of Model in Price Prediction of Used Vehicles Using Regression Analysis. Data Metadata. 2024;3:425.
15. Alauthman M, Aslam N, Al-Kasassbeh M, Khan S, Al-Qerem A, Choo KKR. An efficient reinforcement learning-based Botnet detection approach. J Netw Comput Appl. 2020;150:102479.
16. Chiew KL, Yong KSC, Tan CL. A survey of phishing attacks: Their types, vectors and technical approaches. Expert Syst Appl. 2018;106:1–20.
17. Sahingoz OK. Networking models in flying ad-hoc networks (FANETs): Concepts and challenges. J Intell Robot Syst. 2014;74:513–27.
18. Das A, Baki S, El Aassal A, Verma R, Dunbar A. SoK: a comprehensive reexamination of phishing research from the security perspective. IEEE Commun Surv Tutorials. 2019;22(1):671–708.
19. Jhanjhi NZ, Shah IA. Navigating Cyber Threats and Cybersecurity in the Logistics Industry. IGI Global; 2024.
20. Keyvanpour MR, Javideh M, Ebrahimi MR. Detecting and investigating crime by means of data mining: a general crime matching framework. Procedia Comput Sci. 2011;3:872–80.
21. Chiew KL, Chang EH, Tiong WK. Utilisation of website logo for phishing detection. Comput Secur. 2015;54:16–26.
22. Jain AK, Gupta BB. A survey of phishing attack techniques, defence mechanisms and open research challenges. Enterp Inf Syst. 2022;16(4):527–65.
23. Zhang J, Pan Y, Wang Z, Liu B. URL based gateway side phishing detection method. In: 2016 IEEE Trustcom/BigDataSE/ISPA. IEEE; 2016. p. 268–75.
24. Rao RS, Vaishnavi T, Pais AR. CatchPhish: detection of phishing websites by inspecting URLs. J Ambient Intell Humaniz Comput. 2020;11:813–25.
25. Mohamed G, Visumathi J, Mahdal M, Anand J, Elangovan M. An effective and secure mechanism for phishing attacks using a machine learning approach. Processes. 2022;10(7):1356.
26. Nguyen TP, Pham CC, Ha SVU, Jeon JW. Change detection by training a triplet network for motion feature extraction. IEEE Trans Circuits Syst Video Technol. 2018;29(2):433–46.
27. Karunakaran B, Misra D, Marshall K, Mathrawala D, Kethireddy S. Closing the loop—Finding lung cancer patients using NLP. In: 2017 IEEE international conference on big data (big data). IEEE; 2017. p. 2452–61.
28. Sahingoz OK, Buber E, Demir O, Diri B. Machine learning based phishing detection from URLs. Expert Syst Appl. 2019;117:345–57.
29. Basnet RB, Sung AH. Learning to Detect Phishing Webpages. J Internet Serv Inf Secur. 2014;4(3):21–39.
30. Hong J, Kim H, Oh S, Im Y, Jeong H, Kim H, et al. Combating phishing and script-based attacks: a novel machine learning framework for improved client-side security. J Supercomput. 2025;81(1):1–24.
31. Phishing Detection Dataset [Internet]. Available from: https://www.kaggle.com/datasets/sharmi3754/phishing-detection-dataset.
Published
Issue
Section
License
Copyright (c) 2025 Sultan Ahmad, Md Alimul Haque, Hikmat A. M. Abdeljaber, M. U. Bokhari, Jabeen Nazeer, B. K. Mishra (Author)

This work is licensed under a Creative Commons Attribution 4.0 International License.
The article is distributed under the Creative Commons Attribution 4.0 License. Unless otherwise stated, associated published material is distributed under the same licence.