Implication of Different Data Split Ratio on the Performance of Model in Price Prediction of Used Vehicles Using Regression Analysis

Authors

  • Alimul Haque Department of Computer Science, Veer Kunwar Singh University. Ara, 802301, India Author https://orcid.org/0000-0002-0744-0784
  • Shams Raza Academic Counselor, IGNOU International Division. India Author
  • Sultan Ahmad Department of Computer Science, College of Computer Engineering and Sciences, Prince Sattam Bin Abdulaziz University. Alkharj, 11942, Saudi Arabia Author https://orcid.org/0000-0002-3198-7974
  • Alamgir Hossain Department of Computer Science and Engineering, Prime University. Dhaka 1216, Bangladesh Author https://orcid.org/0000-0001-5120-2911
  • Hikmat A. M. Abdeljaber Department of Computer Science, Faculty of Information Technology, Applied Science Private University. Amman, Jordan Author https://orcid.org/0000-0001-9557-3933
  • A. E. M. Eljialy Department of Information Systems, College of Computer Engineering and Sciences, Prince Sattam Bin Abdulaziz University. Alkharj, 11942, Saudi Arabia Author https://orcid.org/0000-0002-7705-9030
  • Sultan Alanazi Department of Computer Science, College of Computer Engineering and Sciences, Prince Sattam Bin Abdulaziz University. Alkharj, 11942, Saudi Arabia Author https://orcid.org/0000-0002-9627-1715
  • Jabeen Nazeer Department of Computer Science, College of Computer Engineering and Sciences, Prince Sattam Bin Abdulaziz University. Alkharj, 11942, Saudi Arabia Author https://orcid.org/0000-0002-9242-6230

DOI:

https://doi.org/10.56294/dm2024425

Keywords:

Artificial Intelligence, Linear Models, Machine Learning, Commerce, Dataset

Abstract

Introduction: artificial intelligence (AI) and Machine Learning have become buzzwords lately due to technological changes and data quality testing, especially in shape and finish analysis. Lots of research has been conducted for linear regression algorithms to predict the price in different sectors for share stock, rental properties, prices of used cars etc. This study provides suitable data split ratio for optimum cost estimation based on linear regression model. In present days there is an increasing demand for having own car for every middle-class family therefore this have given opportunity to motor vehicle business to offer wide range of used vehicle for re-sale especially companies like Maruti Suzuki, Tata motors & Mahendra motors in Indian motor vehicle industries. Therefore, it is important to know the current value of your car before spending your hard-earned money on any item. 
Objective: the objective of this paper is finding appropriate value of cars in Metropolitans or even in state capitals. Features like model, mileage, AC, seating capacities, fuel type automatic will be taken into account when doing this. This estimate is designed to help customers find the right options to suit their needs. 
Method: we have used a linear regression model to estimate the value of the respective car. 
Results: for doing this price prediction in this paper using liner regression we have tried to find the optimum accuracy of model by varying data split ratio for training and test data set and concluded with the result that 80/20 ratio is the best ratio with optimum model accuracy for business domain analysis with labelled data set. 
Conclusions: the findings underscore the importance of careful consideration when selecting a data split ratio for price prediction models in the used vehicle market. The insights gleaned from this study can inform future research and contribute to the development of more accurate and reliable regression models in similar domains

References

1. S. Zeba, M. A. Haque, S. Alhazmi, and S. Haque, “Advanced Topics in Machine Learning,” Mach. Learn. Methods Eng. Appl. Dev., p. 197, 2022.

2. V. Whig, B. Othman, A. Gehlot, M. A. Haque, S. Qamar, and J. Singh, “An Empirical Analysis of Artificial Intelligence (AI) as a Growth Engine for the Healthcare Sector,” in 2022 2nd International Conference on Advance Computing and Innovative Technologies in Engineering (ICACITE), IEEE, 2022, pp. 2454–2457.

3. M. A. Haque et al., “Achieving Organizational Effectiveness through Machine Learning Based Approaches for Malware Analysis and Detection,” Data Metadata, vol. 2, p. 139, 2023.

4. D. Sinwar, V. S. Dhaka, M. K. Sharma, and G. Rani, “AI-based yield prediction and smart irrigation,” in Internet of Things and Analytics for Agriculture, Volume 2, Springer, 2020, pp. 155–180.

5. I. Hapsari and I. Surjandari, “Visiting time prediction using machine learning regression algorithm,” in 2018 6th International Conference on Information and Communication Technology (ICoICT), IEEE, 2018, pp. 495–500.

6. N. Nafi’iyah and K. F. Mauladi, “Linear regression analysis and SVR in predicting motor vehicle theft,” in 2021 International Seminar on Application for Technology of Information and Communication (iSemantic), IEEE, 2021, pp. 54–58.

7. M. Kavita and P. Mathur, “Crop yield estimation in India using machine learning,” in 2020 IEEE 5th International Conference on Computing Communication and Automation (ICCCA), IEEE, 2020, pp. 220–224.

8. S. Ahmad, S. Jha, A. Alam, M. Yaseen, and H. A. M. Abdeljaber, “A Novel AI-Based Stock Market Prediction Using Machine Learning Algorithm,” Sci. Program., vol. 2022, 2022.

9. M. A. Hossain et al., “AI-enabled approach for enhancing obfuscated malware detection: a hybrid ensemble learning with combined feature selection techniques,” Int. J. Syst. Assur. Eng. Manag., 2024, doi: 10.1007/s13198-024-02294-y.

10. D. T. Bui, B. Pradhan, O. Lofman, I. Revhaug, and O. B. Dick, “Landslide susceptibility mapping at Hoa Binh province (Vietnam) using an adaptive neuro-fuzzy inference system and GIS,” Comput. Geosci., vol. 45, pp. 199–211, 2012.

11. W. Chen et al., “Landslide susceptibility modelling using GIS-based machine learning techniques for Chongren County, Jiangxi Province, China,” Sci. Total Environ., vol. 626, pp. 1121–1135, 2018.

12. F. Huang, K. Yin, J. Huang, L. Gui, and P. Wang, “Landslide susceptibility mapping based on self-organizing-map network and extreme learning machine,” Eng. Geol., vol. 223, pp. 11–22, 2017.

13. K. Taalab, T. Cheng, and Y. Zhang, “Mapping landslide susceptibility and types using Random Forest,” Big Earth Data, vol. 2, no. 2, pp. 159–178, 2018.

14. N. N. Vasu and S.-R. Lee, “A hybrid feature selection algorithm integrating an extreme learning machine for landslide susceptibility modeling of Mt. Woomyeon, South Korea,” Geomorphology, vol. 263, pp. 50–70, 2016.

15. C. Qi, A. Fourie, Q. Chen, and Q. Zhang, “A strength prediction model using artificial intelligence for recycling waste tailings as cemented paste backfill,” J. Clean. Prod., vol. 183, pp. 566–578, 2018.

16. J. Zhou, P. G. Asteris, D. J. Armaghani, and B. T. Pham, “Prediction of ground vibration induced by blasting operations through the use of the Bayesian Network and random forest models,” Soil Dyn. Earthq. Eng., vol. 139, p. 106390, 2020.

17. S. Lu, M. Koopialipoor, P. G. Asteris, M. Bahri, and D. J. Armaghani, “A novel feature selection approach based on tree models for evaluating the punching shear capacity of steel fiber-reinforced concrete flat slabs,” Materials (Basel)., vol. 13, no. 17, p. 3902, 2020.

18. J. Qiu, Q. Wu, G. Ding, Y. Xu, and S. Feng, “A survey of machine learning for big data processing,” EURASIP J. Adv. Signal Process., vol. 2016, no. 1, 2016, doi: 10.1186/s13634-016-0355-x.

19. H.-B. Ly, B. T. Pham, L. M. Le, T.-T. Le, V. M. Le, and P. G. Asteris, “Estimation of axial load-carrying capacity of concrete-filled steel tubes using surrogate models,” Neural Comput. Appl., vol. 33, pp. 3437–3458, 2021.

20. M, Iyyappan, Ahmad S, Jha S, Alam A, Yaseen M, Abdeljaber HA., "A Novel AI-Based Stock Market Prediction Using Machine Learning Algorithm" Scientific Programming. Article ID 4808088, 11 pages, 2022

21. I. Muraina, “Ideal dataset splitting ratios in machine learning algorithms: general concerns for data scientists and data analysts,” in 7th International Mardin Artuklu Scientific Research Conference, 2022, pp. 496–504.

Downloads

Published

2024-01-01

Issue

Section

Original

How to Cite

1.
Haque A, Raza S, Ahmad S, Hossain A, Abdeljaber HAM, Eljialy AEM, et al. Implication of Different Data Split Ratio on the Performance of Model in Price Prediction of Used Vehicles Using Regression Analysis. Data and Metadata [Internet]. 2024 Jan. 1 [cited 2024 Dec. 21];3:425. Available from: https://dm.ageditor.ar/index.php/dm/article/view/264