Unveiling insights from unstructured wealth: a comparative analysis of clustering techniques on blockchain cryptocurrency data

Ramzi A. Haraty, Salma Sobeh

Abstract


In the fourth industrial revolution era of today, individuals encounter an immense volume of information daily. The digital world is rich in data like IoT, social media, healthcare, business, cryptocurrencies, cybersecurity, etc. The situation can become problematic as these vast amounts of data require significant storage capacity, which leads to challenges in executing tasks such as analytical operations, processing operations, and retrieval operations that are time-consuming and arduous. To effectively analyze and utilize this data, artificial intelligence, particularly machine learning, and deep learning, can provide a practical solution. Clustering, an unsupervised learning technique, aims to identify a specific number of clusters to effectively categorize the data through data grouping. Hence, clustering is related to many fields and is used in various applications that deal with large datasets. This survey examines seven widely recognized clustering techniques, namely k-means, G-means, DBSCAN, Agglomerative hierarchical clustering, Two-stage density (DBSCAN and k-means) algorithm, Two-levels (DBSCAN and hierarchical) clustering algorithm, and Two-stage MeanShift and k-means clustering algorithm and compares them with a real dataset - The Blockchain dataset, including prominent cryptocurrencies like Binance, Bitcoin, Doge, and Ethereum, under several metrics such as silhouette coefficient, Calinski-Harabasz, Davies-Bouldin Index, time complexity, and entropy.

 

Received: 20 July 2023

Accepted: 28 November 2023

Published: 28 January 2024


Keywords


Clustering

Full Text:

PDF

References


S. Sreedhar Kumar, M. Madheswaran, B. A. Vinutha, H. Manjunatha Singh, and K. V. Charan, “A brief survey of unsupervised agglomerative hierarchical clustering schemes,” Progress in Color, Colorants and Coatings, vol. 8, no. 1, 2018, doi: 10.14419/ijet.v8i1.15803.

M. Ester, H. P. Kriegel, J. Sander, and X. Xu, “A Density-Based Algorithm for Discovering Clusters A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise,” in Proceedings - 2nd International Conference on Knowledge Discovery and Data Mining, KDD 1996, 1996.

M. Halkidi and M. Vazirgiannis, “A density-based cluster validity approach using multi-representatives,” Pattern Recognit Lett, vol. 29, no. 6, pp. 773–786, Apr. 2008, doi: 10.1016/j.patrec.2007.12.011.

D. Brown, A. Japa, and Y. Shi, “A Fast Density-Grid Based Clustering Method,” in 2019 IEEE 9th Annual Computing and Communication Workshop and Conference (CCWC), IEEE, Jan. 2019, pp. 0048–0054. doi: 10.1109/CCWC.2019.8666548.

V. Kanageswari and A. Pethalakshmi, “A Novel Approach of Clustering Using COBWEB,” International Journal of Information Technology, vol. 3, no. 3, 2015.

D. Z. J. H. and J. F. M. Wegmann, “A Review of a Systematic Selection of Clustering Algorithms and their Evaluation,” ArXiv, Jun. 2021.

X. Jin and J. Han, “K-Medoids Clustering,” in Encyclopedia of Machine Learning, Boston, MA: Springer US, 2011, pp. 564–565. doi: 10.1007/978-0-387-30164-8_426.

Y. Rani and D. H. Rohil, “A Study of Hierarchical Clustering Algorithm,” International Research Publications House, vol. 3, p. 8, Nov. 2013.

P. Bhattacharjee and P. Mitra, “A survey of density based clustering algorithms,” Front Comput Sci, vol. 15, no. 1, p. 151308, Feb. 2021, doi: 10.1007/s11704-019-9059-3.

M. Ilango and V. Mohan, “A Survey of Grid Based Clustering Algorithms,” International Journal of Engineering Science and Technology, vol. 2, no. 8, 2010.

D. Tomar and S. Agarwal, “A survey on Data Mining approaches for Healthcare,” International Journal of Bio-Science and Bio-Technology, vol. 5, no. 5, pp. 241–266, Oct. 2013, doi: 10.14257/ijbsbt.2013.5.5.25.

S. Suman, “A Survey on STING and CLIQUE Grid Based – ProQuest,” International Journal of Advanced Research in Computer Science, vol. 5, pp. 1512–1512, May 2017.

I. H. Sarker, M. H. Furhad, and R. Nowrozy, “AI-Driven Cybersecurity: An Overview, Security Intelligence Modeling and Research Directions,” SN Comput Sci, vol. 2, no. 3, p. 173, May 2021, doi: 10.1007/s42979-021-00557-0.

N. Samy, R. Fathalla, N. A. Belal, and O. Badawy, “Classification of Autism Gene Expression Data Using Deep Learning,” 2020, pp. 583–596. doi: 10.1007/978-3-030-34080-3_66.

H. Xu, S. Yao, Q. Li, and Z. Ye, “An Improved K-means Clustering Algorithm,” in 2020 IEEE 5th International Symposium on Smart and Wireless Systems within the Conferences on Intelligent Data Acquisition and Advanced Computing Systems (IDAACS-SWS), IEEE, Sep. 2020, pp. 1–5. doi: 10.1109/IDAACS-SWS50031.2020.9297060.

D. Miljkovic, “Brief review of self-organizing maps,” in 2017 40th International Convention on Information and Communication Technology, Electronics and Microelectronics (MIPRO), IEEE, May 2017, pp. 1061–1066. doi: 10.23919/MIPRO.2017.7973581.

A. Kumar, Y. S. Ingle, P. Abhijit, and P. Dhule, “Canopy Clustering : A Review on Pre-Clustering Approach to K-Means Clustering,” International Journal of Innovations & Advancement in Computer Science, vol. 3, no. 5, 2014.

Y. Zhang, S. Ding, Y. Wang, and H. Hou, “Chameleon algorithm based on improved natural neighbor graph generating sub-clusters,” Applied Intelligence, vol. 51, no. 11, pp. 8399–8415, Nov. 2021, doi: 10.1007/s10489-021-02389-0.

A. K. Jain, M. N. Murty, and P. J. Flynn, “Data Clustering ACM Computing Surveys,” Intelligent multidimensional data clustering and analysis, vol. 31, no. 3, 1999.

J. Oyelade et al., “Data Clustering: Algorithms and Its Applications,” in 2019 19th International Conference on Computational Science and Its Applications (ICCSA), IEEE, Jul. 2019, pp. 71–81. doi: 10.1109/ICCSA.2019.000-1.

M. J. Zaki and J. W. Meira, Data Mining and Analysis. Cambridge University Press, 2014. doi: 10.1017/CBO9780511810114.

L. Cao, “Data Science,” ACM Comput Surv, vol. 50, no. 3, pp. 1–42, May 2018, doi: 10.1145/3076253.

D. Deng, “DBSCAN Clustering Algorithm Based on Density,” in 2020 7th International Forum on Electrical Engineering and Automation (IFEEA), IEEE, Sep. 2020, pp. 949–953. doi: 10.1109/IFEEA51475.2020.00199.

D. Deng, “DBSCAN Clustering Algorithm Based on Density,” in 2020 7th International Forum on Electrical Engineering and Automation (IFEEA), IEEE, Sep. 2020, pp. 949–953. doi: 10.1109/IFEEA51475.2020.00199.

I. H. Sarker, “Deep Cybersecurity: A Comprehensive Overview from Neural Network and Deep Learning Perspective,” SN Comput Sci, vol. 2, no. 3, p. 154, May 2021, doi: 10.1007/s42979-021-00535-6.

H. Rehioui, A. Idrissi, M. Abourezq, and F. Zegrari, “DENCLUE-IM: A New Approach for Big Data Clustering,” Procedia Comput Sci, vol. 83, pp. 560–567, 2016, doi: 10.1016/j.procs.2016.04.265.

G. Hamerly & C. Elkan, “Learning the k in k-means,” Adv Neural Inf Process Syst, 2004.

G. Jia, H.-K. Lam, S. Ma, Z. Yang, Y. Xu, and B. Xiao, “Classification of Electromyographic Hand Gesture Signals Using Modified Fuzzy C-Means Clustering and Two-Step Machine Learning Approach,” IEEE Transactions on Neural Systems and Rehabilitation Engineering, vol. 28, no. 6, pp. 1428–1435, Jun. 2020, doi: 10.1109/TNSRE.2020.2986884.

T. D. Khang, N. D. Vuong, M.-K. Tran, and M. Fowler, “Fuzzy C-Means Clustering Algorithm with Multiple Fuzzification Coefficients,” Algorithms, vol. 13, no. 7, p. 158, Jun. 2020, doi: 10.3390/a13070158.

M. C. Nwadiugwu, “Gene-Based Clustering Algorithms: Comparison Between Denclue, Fuzzy-C, and BIRCH,” Bioinform Biol Insights, vol. 14, p. 117793222090985, Jan. 2020, doi: 10.1177/1177932220909851.

R. R. Vatsavai, C. T. Symons, V. Chandola, and G. Jun, “GX-Means: A model-based divide and merge algorithm for geospatial image clustering,” Procedia Comput Sci, vol. 4, pp. 186–195, 2011, doi: 10.1016/j.procs.2011.04.020.

R. Nainggolan, R. Perangin-angin, E. Simarmata, and A. F. Tarigan, “Improved the Performance of the K-Means Cluster Using the Sum of Squared Error (SSE) optimized by using the Elbow Method,” J Phys Conf Ser, vol. 1361, no. 1, p. 012015, Nov. 2019, doi: 10.1088/1742-6596/1361/1/012015.

J. Qi, Y. Yu, L. Wang, and J. Liu, “K*-Means: An Effective and Efficient K-Means Clustering Algorithm,” in 2016 IEEE International Conferences on Big Data and Cloud Computing (BDCloud), Social Computing and Networking (SocialCom), Sustainable Computing and Communications (SustainCom) (BDCloud-SocialCom-SustainCom), IEEE, Oct. 2016, pp. 242–249. doi: 10.1109/BDCloud-SocialCom-SustainCom.2016.46.

X. Jin and J. Han, “K-Medoids Clustering,” in Encyclopedia of Machine Learning, Boston, MA: Springer US, 2011, pp. 564–565. doi: 10.1007/978-0-387-30164-8_426.

G. Hamerly and C. Elkan, “Alternatives to the k-means algorithm that find better clusterings,” in International Conference on Information and Knowledge Management, Proceedings, 2002. doi: 10.1145/584792.584890.

I. H. Sarker, “Machine Learning: Algorithms, Real-World Applications and Research Directions,” SN Comput Sci, vol. 2, no. 3, p. 160, May 2021, doi: 10.1007/s42979-021-00592-x.

X.-L. Meng and D. Van Dyk, “The EM Algorithm—an Old Folk-song Sung to a Fast New Tune,” J R Stat Soc Series B Stat Methodol, vol. 59, no. 3, pp. 511–567, Sep. 1997, doi: 10.1111/1467-9868.00082.

I. H. Sarker, M. M. Hoque, Md. K. Uddin, and T. Alsanoosy, “Mobile Data Science and Intelligent Apps: Concepts, AI-Based Modeling and Research Directions,” Mobile Networks and Applications, vol. 26, no. 1, pp. 285–303, Feb. 2021, doi: 10.1007/s11036-020-01650-z.

R. A. Haraty, M. Dimishkieh, and M. Masud, “An enhanced k-means clustering algorithm for pattern discovery in healthcare data,” Int J Distrib Sens Netw, vol. 2015, 2015, doi: 10.1155/2015/615740.

F. U. Siddiqui and A. Yahya, “Partitioning Clustering Techniques,” in Clustering Techniques for Image Segmentation, Cham: Springer International Publishing, 2022, pp. 35–67. doi: 10.1007/978-3-030-81230-0_2.

S. Renjith, A. Sreekumar, and M. Jathavedan, “Performance evaluation of clustering algorithms for varying cardinality and dimensionality of data sets,” Mater Today Proc, vol. 27, pp. 627–633, 2020, doi: 10.1016/j.matpr.2020.01.110.

S. Zhang, Z. You, and X. Wu, “Plant disease leaf image segmentation based on superpixel clustering and EM algorithm,” Neural Comput Appl, vol. 31, no. S2, pp. 1225–1232, Feb. 2019, doi: 10.1007/s00521-017-3067-8.

L. Meng’Ao, M. Dongxue, G. Songyuan, and L. Shufen, “Research and Improvement of DBSCAN Cluster Algorithm,” in 2015 7th International Conference on Information Technology in Medicine and Education (ITME), IEEE, Nov. 2015, pp. 537–540. doi: 10.1109/ITME.2015.100.

R. A. Haraty and A. Assaf, “DG-means: a superior greedy algorithm for clustering distributed data,” J Supercomput, Jul. 2023, doi: 10.1007/s11227-023-05508-5.

R. Xu and D. WunschII, “Survey of Clustering Algorithms,” IEEE Trans Neural Netw, vol. 16, no. 3, pp. 645–678, May 2005, doi: 10.1109/TNN.2005.845141.

M. Ester, H.-P. Kriegel, J. Sander, and X. Xu, “A Density Based Notion of Clusters,” Proceedings of 2nd International Conference on Knowledge Discovery and Data Mining, 1996.

B. J. Frey and D. Dueck, “Clustering by passing messages between data points,” Science (1979), vol. 315, no. 5814, 2007, doi: 10.1126/science.1136800.

J. Wu, H. Xiong, and J. Chen, “Towards understanding hierarchical clustering: A data distribution perspective,” Neurocomputing, vol. 72, no. 10–12, 2009, doi: 10.1016/j.neucom.2008.12.011.

M. Wang, Y.-Y. Zhang, F. Min, L.-P. Deng, and L. Gao, “A two-stage density clustering algorithm,” Soft comput, vol. 24, no. 23, pp. 17797–17819, Dec. 2020, doi: 10.1007/s00500-020-05028-x.

A. Latifi-Pakdehi and N. Daneshpour, “DBHC: A DBSCAN-based hierarchical clustering algorithm,” Data Knowl Eng, vol. 135, p. 101922, Sep. 2021, doi: 10.1016/j.datak.2021.101922.

S. Sun, H. Song, D. He, and Y. Long, “An adaptive segmentation method combining MSRCR and mean shift algorithm with K-means correction of green apples in natural environment,” Information Processing in Agriculture, vol. 6, no. 2, pp. 200–215, Jun. 2019, doi: 10.1016/j.inpa.2018.08.011.




DOI: http://dx.doi.org/10.21622/ACE.2024.04.1.698

Refbacks

  • There are currently no refbacks.


Copyright (c) 2024 Ramzi A. Haraty, Salma Sobeh


Advances in Computing and Engineering
E-ISSN: 2735-5985
P-ISSN: 2735-5977

Published by:

Academy Publishing Center (APC)
Arab Academy for Science, Technology and Maritime Transport (AASTMT)
Alexandria, Egypt
ace@aast.edu