【摘要】为解决K-means聚类对初始聚类中心敏感和易陷入局部最优的问题,提出一种基于改进磷虾群算法与K-harmonic means的混合数据聚类算法.提出一种具有莱维飞行和交叉算子的磷虾群算法以改进磷虾群算法易陷入局部极值和搜索效率低的不足,即在每次标准磷虾群位置更新后加入新的位置更新方法进一步搜索以提高种群的搜索能力,同时交替使用莱维飞行与交叉算子对当前群体位置进行贪婪搜索以增强算法的全局搜索能力. 20个标准测试函数的实验结果表明,改进算法不易陷入局部最优解,可在较少的迭代次数下有效地搜索到全局最优解的同时保证算法的稳定性.将改进的磷虾群算法与K调和均值聚类融合,即在每次迭代后用最优个体或经过K调和均值迭代一次后的新个体替换最差个体. 5个UCI真实数据集的测试结果表明:融合后的聚类算法能够克服K-means对初始聚类中心敏感的不足且具有较强的全局收敛性.
A hybrid data clustering algorithm based on improved krill herd algorithm and KHM clustering
【Abstract】 K-means clustering is sensitive to initial clustering centroids and prone to fall into local optimum. A hybrid data clustering algorithm based on an improved krill herd (KH) algorithm and K-harmonic means clustering is proposed in order to solve the problem. Firstly, an improved KH algorithm with Lévy flight and crossover operator is proposed to avoid local best and low search efficiency of the KH algorithm. After each updating of standard krill herd position, a new position updating method is adopted to further improve the search ability of the population. At the same time, Lévy flight and crossover operators are used alternately to carry out greedy search for the current herd position to enhance the global search ability of the algorithm. The experimental results of 20 benchmark functions show that the improved algorithm is not easy to fall into the local best, which can find the global optimal solution with smaller iteration number and ensure the stability of the algorithm. Then, the improved KH algorithm and the K-harmonic means clustering algorithm are combined to solve the problem of data clustering. The worst individual is replaced by the best individual or the new individual generated by the K-harmonic means algorithm after each iteration. The test results of five real data sets from UCI show that the integrated clustering algorithm overcomes the defect that K-means is sensitive to the initial clustering centroid with strong global convergence.
【Keywords】 krill herd algorithm; Lévy flight; crossover operator; K-harmonic means clustering; hybrid clustering;
【Funds】 National Natural Science Foundation of China (61772416);
 Tan Pang-ning, Steinbach Michael, Kumar Vipin, et al. Introduction to data mining [M]. Beijing: Posts & Telecom Press, 2011: 306.
 Krista RizmanŽalik. An efficient k-means clustering algorithm [J]. Pattern Recognition Letters, 2008, 29(9): 1385-1391.
 Zhang Bin, Hsu Meichun, Dayal Umeshwar. K-harmonic means—A data clustering algorithm [R]. Palo Alto: Hewlett-Packard Laboratories, 1999.
 Carvalho V O. Combining K-Means and K-Harmonic with fish school search algorithm for data clustering task on graphics processing units [J]. Applied Soft Computing, 2016, 41: 290–304.
 Zhou Z, Zhao X, Zhu S. K-harmonic means clustering algorithm using feature weighting for color image segmentation [J]. Multimedia Tools&Applications, 2018, 77: 15139–15160.
 Khanmohammadi S, Adibeig N, Shanehbandy S. An improved overlapping k-means clustering method for medical applications [J]. Expert Systems with Applications, 2017, 67: 12–18.
Wu B, Wang D Z, Wu X H, et al. Possibilistic fuzzy K-harmonic means clustering of fourier transform infrared spectra of tea [J]. Spectroscopy and Spectral Analysis, 2018, 38(3): 745–749 (in Chinese).
 Mahi H, Farhi N, Labed K. Remotely sensed data clustering using K-harmonic means algorithm and cluster validity index [J]. IFIP Advances in Information and Communication Technology, 2018, 456: 105–116.
 Yeh W C, Lai C M, Chang K H. A novel hybrid clustering approach based on K-harmonic means using robust design [J]. Neurocomputing, 2016, 173: 1720–1732.
 Güngör Z, Ünler A. K-harmonic means data clustering with simulated annealing heuristic [J]. Applied Mathematics&Computation, 2007, 184(2): 199–209.
 Jiang H, Yi S, Li J, et al. Ant clustering algorithm with K-harmonic means clustering [J]. Expert Systems with Applications, 2010, 37(12): 8679–8684.
 Yang F, Sun T, Zhang C. An efficient hybrid data clustering method based on K-harmonic means and particle swarm optimization [J]. Expert Systems with Applications, 2009, 36(6): 9847–9852.
 Bouyer A, Hatamlou A. An efficient hybrid clustering method based on improved cuckoo optimization and modified particle swarm optimization algorithms [J]. Applied Soft Computing, 2018, 67: 172–182.
 Gandomi A H, Alavi A H. Krill herd: A new bio-inspired optimization algorithm [J]. Communications in Nonlinear Science&Numerical Simulation, 2012, 17(12): 4831–4845.
 Servet M Kiran. Particle swarm optimization with a new update mechanism [J]. Applied Soft Computing, 2017, 60: 670–678.
 Chechkin A V, Metzler R, Klafter J, et al. Introduction to the theory of Lévy flights [M]. Anomalous Transport: Foundations and Applications, 2008: 1–41.
 Yang X S, Suash Deb. Cuckoo search via Lévy flights[C]. World Congress on Nature&Biologically Inspired Computing. Coimbatore: IEEE, 2009: 210–214.
Wang X W, Yan Y X, Gu X S. Welding robot path planning based on Lévy-PSO [J]. Control and Decision, 2017, 32(2): 373–377 (in Chinese).
Zhang X M, Wang X, Tu Q, et al. Particle swarm optimization algorithm based on combining global-best operator and Lévy flight [J]. Journal of University of Electronic Science and Technology of China, 2018, 47(3): 103–111 (in Chinese).
 Tawhid M A, Ali A F. Simplex particle swarm optimization with arithmetical crossover for solving global optimization problems [J]. Opsearch, 2016, 53: 705–740.
 Chen Y, Li L, Xiao J, et al. Particle swarm optimizer with crossover operation [J]. Engineering Applications of Artificial Intelligence, 2018, 70: 159–169.
 Eberhart R, Kennedy J. A new optimizer using particle swarm theory[C]. Proceedings of the 6th International Symposium on Micro Machine and Human Science. Nagoya: IEEE, 1995: 39–43.
 Mirjalili S. SCA: A sine cosine algorithm for solving optimization problems [J]. Knowledge-Based Systems, 2016, 96: 120–133.
 Mirjalili S. Moth-flame optimization algorithm: A novel nature-inspired heuristic paradigm [J]. Knowledge-Based Systems, 2015, 89: 228–249.