Heterogeneous Computing for High-Complexity Problems: A Performance Study with DPC++ on HPC Platforms

Katari Tituaña; MacArthur Ortega-Bustamante; Pedro Granda; Marco Pusdá-Chulde

doi:10.56294/dm2025835

Authors

Katari Tituaña Universidad Técnica del Norte, Facultad de Ingeniería en Ciencias Aplicadas, Ibarra, Ecuador Author https://orcid.org/0009-0009-8987-0689
MacArthur Ortega-Bustamante Universidad Técnica del Norte, Facultad de Ingeniería en Ciencias Aplicadas, Ibarra, Ecuador Author https://orcid.org/0000-0003-3061-9595
Pedro Granda Universidad Técnica del Norte, Facultad de Ingeniería en Ciencias Aplicadas, Ibarra, Ecuador Author https://orcid.org/0000-0001-5638-5673
Marco Pusdá-Chulde Universidad Técnica del Norte, Facultad de Ingeniería en Ciencias Aplicadas, Ibarra, Ecuador Author https://orcid.org/0000-0003-4265-999X

DOI:

https://doi.org/10.56294/dm2025835

Keywords:

Heterogeneous Programming, All Pairs Shortest Paths, HPC, Intel oneAPI, DPC++, Parallelism

Abstract

The All-Pairs Shortest Paths (APSP) algorithm is crucial for a wide range of computing applications that necessitate the efficient calculation of minimum distances between all pairs of nodes in a graph. High-performance computing (HPC) has made significant progress in response to the demand to solve complex computing problems that process large amounts of data. The cubic complexity of the Floyd-Warshall algorithm makes its execution a computational challenge with large graphs, limiting its applicability on conventional platforms. This research addresses this problem by evaluating optimized and parallelized implementations in HPC architectures. The study aims to evaluate the performance of the APSP algorithm by parallel implementation using Intel DPC++ on heterogeneous high-performance architectures, including CPU, GPU, and FPGA. The Intel OneAPI platform was used for the implementation and execution of the algorithm on Intel Xeon Gold 6128 CPU-3.40 GHz processors, Intel Data Center GPU Max 1100, and FPGA Emulation Device processors. The solution was evaluated considering quantitative and statistical metrics, obtaining a significant improvement in the performance of the algorithm for large graphs, with 700x speedup and 70% efficiency in GPU architectures with 65536 nodes and 1024 parallelism subblocks. This study confirms the feasibility of implementing efficient and portable HPC solutions using DPC++ for massive data processing in heterogeneous architectures, offering a scalable and effective solution for modern computational challenges.

References

1. Huang, S., Wu, K., Chalamalasetti, S.R., El Hajj, I., Xu, C., Faraboschi, P., Chen, D.: A Python-based High-Level Programming Flow for CPU-FPGA Heterogeneous Systems : (Invited Paper). Proceedings of PEHC 2021: Workshop on Programming Environments for Heterogeneous Computing, Held in conjunction with SC 2021: The International Conference for High Performance Computing, Networking, Storage and Analysis. 20–26 (2021). https://doi.org/10.1109/PEHC54839.2021.00008. DOI: https://doi.org/10.1109/PEHC54839.2021.00008

2. Ramírez Patiño, L.M., Guerrero Hernández, L.E.: Arquitecturas Híbridas o Heterogéneas Paralelo entre NVIDIA CUDA y Parallel Studio XE para Intel® Xeon PhiTM. (2017).

3. Gizopoulos, D., Papadimitriou, G., Chatzidimitriou, A., Reddi, V.J., Salami, B., Unsal, O.S., Kestelman, A.C., Leng, J.: Modern Hardware Margins: CPUs, GPUs, FPGAs Recent System-Level Studies. In: 2019 IEEE 25th International Symposium on On-Line Testing and Robust System Design (IOLTS). pp. 129–134 (2019). https://doi.org/10.1109/IOLTS.2019.8854386. DOI: https://doi.org/10.1109/IOLTS.2019.8854386

4. Madiajagan, M., Raj, S.S.: Parallel Computing, Graphics Processing Unit (GPU) and New Hardware for Deep Learning in Computational Intelligence Research. Deep Learning and Parallel Computing Environment for Bioengineering Systems. 1–15 (2019). https://doi.org/10.1016/B978-0-12-816718-2.00008-7. DOI: https://doi.org/10.1016/B978-0-12-816718-2.00008-7

5. Barney, B., Frederick, D., Livermore, C.: Introduction to Parallel Computing Tutorial, https://hpc.llnl.gov/documentation/tutorials/introduction-parallel-computing-tutorial, last accessed 2023/09/26.

6. Soto, R.T.: Programación paralela sobre arquitecturas heterogéneas, https://repositorio.unal.edu.co/ handle/unal/57830, (2016).

7. Tituaña, K.: Optimización del procesamiento en paralelo utilizando programación heterogénea para mejorar el rendimiento de algoritmos de alto coste computacional, https://repositorio.utn.edu.ec/handle/123456789/16350, (2024).

8. Malagon, E., Rojas, A.: Analysis and simulation of graphs applied to learning with parallel programming in HPC. 2017 CHILEAN Conference on Electrical, Electronics Engineering, Information and Communication Technologies, CHILECON 2017 - Proceedings. 2017-January, 1–7 (2017). https://doi.org/10.1109/CHILECON.2017.8229646. DOI: https://doi.org/10.1109/CHILECON.2017.8229646

9. Kuzmiakova, A.: Concurrent, Parallel and Distributed Computing. Arcler Press (2023).

10. Reinders, J., Ashbaugh, B., Brodman, J., Kinsner, M., Pennycook, J., Tian, X.: Data Parallel C++. Apress (2021). https://doi.org/10.1007/978-1-4842-5574-2. DOI: https://doi.org/10.1007/978-1-4842-5574-2

11. Chirila, M., DrAlberto, P., Ting, H.-Y., Veidenbaum, A., Nicolau, A.: A Heterogeneous Solution to the All-pairs Shortest Path Problem using FPGAs. In: 2022 23rd International Symposium on Quality Electronic Design (ISQED). pp. 108–113. IEEE (2022). https://doi.org/10.1109/ISQED54688.2022.9806279. DOI: https://doi.org/10.1109/ISQED54688.2022.9806279

12. Alghamdi, M.H., He, L., Ren, S., Maray, M.: Efficient Parallel Processing of All-Pairs Shortest Paths on Multicore and GPU Systems. IEEE Transactions on Consumer Electronics. 70, 2896–2908 (2024). https://doi.org/10.1109/TCE.2023.3327328. DOI: https://doi.org/10.1109/TCE.2023.3327328

13. Intel Corporation: Intel DevCloud for oneAPI, https://devcloud.intel.com/oneapi/get_started/, last accessed 2023/08/15.

14. Intel Corporation: Intel oneAPI DPC++/C++ Compiler, https://www.intel.com/content/www/us/en/developer/tools/oneapi/dpc-compiler.html, last accessed 2025/07/26.

15. Li, X., Sun, L., Ling, M., Peng, Y.: A survey of graph neural network based recommendation in social networks. Neurocomputing. 549, 126441 (2023). https://doi.org/10.1016/j.neucom.2023.126441. DOI: https://doi.org/10.1016/j.neucom.2023.126441

16. Ju, W., Fang, Z., Gu, Y., Liu, Z., Long, Q., Qiao, Z., Qin, Y., Shen, J., Sun, F., Xiao, Z., Yang, J., Yuan, J., Zhao, Y., Wang, Y., Luo, X., Zhang, M.: A Comprehensive Survey on Deep Graph Representation Learning. Neural Networks. 173, 106207 (2024). https://doi.org/10.1016/j.neunet.2024.106207. DOI: https://doi.org/10.1016/j.neunet.2024.106207

17. Jiao, L., Chen, J., Liu, F., Yang, S., You, C., Liu, X., Li, L., Hou, B.: Graph Representation Learning Meets Computer Vision: A Survey. IEEE Transactions on Artificial Intelligence. 4, 2–22 (2023). https://doi.org/10.1109/TAI.2022.3194869. DOI: https://doi.org/10.1109/TAI.2022.3194869

18. Ji, H., Wang, X., Shi, C., Wang, B., Yu, P.: Heterogeneous Graph Propagation Network. IEEE Trans Knowl Data Eng. 1–1 (2021). https://doi.org/10.1109/TKDE.2021.3079239. DOI: https://doi.org/10.1109/TKDE.2021.3079239

19. Li, C., Xia, L., Ren, X., Ye, Y., Xu, Y., Huang, C.: Graph Transformer for Recommendation. In: Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval. pp. 1680–1689. ACM, New York, NY, USA (2023). https://doi.org/10.1145/3539618.3591723. DOI: https://doi.org/10.1145/3539618.3591723

20. Intel Corporation: Find the Shortest Path with a Floyd Warshall Algorithm SYCL* Implementation on GPU, https://www.intel.com/content/www/us/en/developer/articles/technical/shortest-path-sycl-based-floyd-warshall-on-gpu.html, last accessed 2024/04/08.

21. Sao, P., Lu, H., Kannan, R., Thakkar, V., Vuduc, R., Potok, T.: Scalable All-pairs Shortest Paths for Huge Graphs on Multi-GPU Clusters. In: Proceedings of the 30th International Symposium on High-Performance Parallel and Distributed Computing. pp. 121–131. ACM, New York, NY, USA (2021). https://doi.org/10.1145/3431379.3460651. DOI: https://doi.org/10.1145/3431379.3460651

22. Huerta, E.A., Khan, A., Davis, E., Bushell, C., Gropp, W.D., Katz, D.S., Kindratenko, V., Koric, S., Kramer, W.T.C., McGinty, B., McHenry, K., Saxton, A.: Convergence of artificial intelligence and high performance computing on NSF-supported cyberinfrastructure. J Big Data. 7, 88 (2020). https://doi.org/10.1186/s40537-020-00361-2. DOI: https://doi.org/10.1186/s40537-020-00361-2

23. Naiouf, M., De Giusti, A.E., De Giusti, L.C., Chichizola, F., Sanz, V.M., Pousa, A., Rucci, E., Basgall, M.J., Sánchez, M., Costanzo, M., Gallo, S.L., Montes de Oca, E.S., Frati, F.E., Gaudiani, A.A.: Algoritmos paralelos y evaluación de rendimiento en plataformas de HPC. XXIII Workshop de Investigadores en Ciencias de la Computación (WICC 2021, Chilecito, La Rioja). 674–679 (2021).

24. Pusdá-Chulde, M.R., Salazar-Fierro, F.A., Sandoval-Pillajo, L., Herrera-Granda, E.P., García-Santillán, I.D., De Giusti, A.: Image Analysis Based on Heterogeneous Architectures for Precision Agriculture: A Systematic Literature Review. 51–70 (2020). https://doi.org/10.1007/978-3-030-33614-1_4. DOI: https://doi.org/10.1007/978-3-030-33614-1_4

25. Vinueza, K., Sandoval-Pillajo, L., Giret-Boggino, A., Trejo-España, D., Pusdá-Chulde, M., García-Santillán, I.: Automatic weed quantification in potato crops based on a modified convolutional neural network using drone images. Data and Metadata. 4, 194 (2025). https://doi.org/10.56294/dm2025194. DOI: https://doi.org/10.56294/dm2025194

26. Sandoval-Pillajo, L., García-Santillán, I., Pusdá-Chulde, M., Giret, A.: Weed detection based on deep learning from UAV imagery: A review. Smart Agricultural Technology. 12, 101147 (2025). https://doi.org/10.1016/j.atech.2025.101147. DOI: https://doi.org/10.1016/j.atech.2025.101147

27. Moreria, R., Pusdá-Chulde, M., Granda, P., García-Santillán, I.: Early Detection of Missing Plants in Maize Crops Through UAV Imaging. Presented at the (2024). https://doi.org/10.1007/978-3-031-70760-5_40. DOI: https://doi.org/10.1007/978-3-031-70760-5_40

28. Chacua, B., Garcia, I., Rosero, P., Suarez, L., Ramirez, I., Simbana, Z., Pusda, M.: People Identification through Facial Recognition using Deep Learning. In: 2019 IEEE Latin American Conference on Computational Intelligence (LA-CCI). pp. 1–6. IEEE (2019). https://doi.org/10.1109/LA-CCI47412.2019.9037043. DOI: https://doi.org/10.1109/LA-CCI47412.2019.9037043

29. Montenegro, S., Pusdá-Chulde, M., Caranqui-Sánchez, V., Herrera-Tapia, J., Ortega-Bustamante, C., García-Santillán, I.: Android Mobile Application for Cattle Body Condition Score Using Convolutional Neural Networks. Presented at the (2023). https://doi.org/10.1007/978-3-031-32213-6_7. DOI: https://doi.org/10.1007/978-3-031-32213-6_7

30. Ulloa, F., Sandoval-Pillajo, L., Landeta-López, P., Granda-Peñafiel, N., Pusdá-Chulde, M., García-Santillán, I.: Identification of Diabetic Retinopathy from Retinography Images Using a Convolutional Neural Network. Presented at the (2025). https://doi.org/10.1007/978-3-031-75702-0_10. DOI: https://doi.org/10.1007/978-3-031-75702-0_10

31. Salazar-Fierro, F., Cumbal, C., Trejo-España, D., León-Fernández, C., Pusdá-Chulde, M., García-Santillán, I.: Detection of Scoliosis in X-Ray Images Using a Convolutional Neural Network. Presented at the (2025). https://doi.org/10.1007/978-3-031-75702-0_13. DOI: https://doi.org/10.1007/978-3-031-75702-0_13

32. Guaichico, E., Pusdá-Chulde, M., Ortega-Bustamante, M., Granda, P., García-Santillán, I.: Mobile app for real-time academic attendance registration based on MobileFaceNet Convolutional neural network. Data and Metadata. 4, 193 (2025). https://doi.org/10.56294/dm2025193. DOI: https://doi.org/10.56294/dm2025193

33. Voloshko, A., Ivutin, A., Novikov, A.S.: Heuristics for Program Code Optimization in Heterogeneous Systems. In: 2021 31st International Conference Radioelektronika (RADIOELEKTRONIKA). pp. 1–6. IEEE (2021). https://doi.org/10.1109/RADIOELEKTRONIKA52220.2021.9420213. DOI: https://doi.org/10.1109/RADIOELEKTRONIKA52220.2021.9420213

34. Amaral, V., Norberto, B., Goulão, M., Aldinucci, M., Benkner, S., Bracciali, A., Carreira, P., Celms, E., Correia, L., Grelck, C., Karatza, H., Kessler, C., Kilpatrick, P., Martiniano, H., Mavridis, I., Pllana, S., Respício, A., Simão, J., Veiga, L., Visa, A.: Programming languages for data-Intensive HPC applications: A systematic mapping study. Parallel Comput. 91, 102584 (2020). https://doi.org/https://doi.org/10.1016/j.parco.2019.102584. DOI: https://doi.org/10.1016/j.parco.2019.102584

35. Abdullah, E.A., Ahmed Saleh, I., Al Saif, O.I.: Performance Evaluation of Parallel Particle Swarm Optimization for Multicore Environment. ICOASE 2018 - International Conference on Advanced Science and Engineering. 81–86 (2018). https://doi.org/10.1109/ICOASE.2018.8548816. DOI: https://doi.org/10.1109/ICOASE.2018.8548816

36. Rangavajhala, A., Tadepalli, A., Mitra, K.: Accelerating the Multi-Objective Optimization of Crystallization Process using High Fidelity Population Balance Model Under NUMBA CUDA Environment. 2024 Tenth Indian Control Conference (ICC). 428–433 (2024). https://doi.org/10.1109/ICC64753.2024.10883754. DOI: https://doi.org/10.1109/ICC64753.2024.10883754

37. Alrawais, A.: Parallel Programming Models and Paradigms: OpenMP Analysis. Proceedings - 5th International Conference on Computing Methodologies and Communication, ICCMC 2021. 1022–1029 (2021). https://doi.org/10.1109/ICCMC51019.2021.9418401. DOI: https://doi.org/10.1109/ICCMC51019.2021.9418401

38. Moreno, K.: Complejidad de un algoritmo (notación Big-O), https://guias.makeitreal.camp/docs/algoritmos/complejidad#complejidad-espacial, last accessed 2023/09/30.

39. Rossainz López, M.: Programación Concurrente y Paralela. (2020).

Heterogeneous Computing for High-Complexity Problems: A Performance Study with DPC++ on HPC Platforms

Authors

DOI:

Keywords:

Abstract

References

Downloads

Published

Issue

Section

License

How to Cite

compendex