python 聚类_Python中的均值漂移聚类算法示例

均值漂移(Mean Shift)是一种层次聚类算法。与监督机器学习算法相比，聚类试图对数据进行分组，而无需首先对标记数据进行训练。聚类用于各种应用，例如搜索引擎，学术排名和医学。与K-Means相反，使用Mean Shift时，您不需要事先知道类(聚类)的数量。Mean Shift的缺点是它的计算成本很高--O(n)。

它是如何运作的

定义一个窗口(bandwidth of the kernel)并将窗口放在数据点上

2.计算窗口中所有点的平均值

3.将窗口的中心移动到平均值的位置

4.重复步骤2和3，直到收敛为止

python中的示例

让我们看一下如何使用python中的Mean Shift算法标记数据。

import numpy as npimport pandas as pdfrom sklearn.cluster import MeanShiftfrom sklearn.datasets.samples_generator import make_blobsfrom matplotlib import pyplot as pltfrom mpl_toolkits.mplot3d import Axes3D

我们使用该make_blobs方法生成自己的机器学习数据。

clusters = [[1,1,1],[5,5,5],[3,10,10]]X, _ = make_blobs(n_samples = 150, centers = clusters, cluster_std = 0.60)

训练机器学习模型后，我们存储聚类中心的坐标。

ms = MeanShift()ms.fit(X)cluster_centers = ms.cluster_centers_

最后，我们在3D图中绘制数据点和质心。

fig = plt.figure()ax = fig.add_subplot(111, projection='3d')ax.scatter(X[:,0], X[:,1], X[:,2], marker='o')ax.scatter(cluster_centers[:,0], cluster_centers[:,1], cluster_centers[:,2], marker='x', color='red', s=300, linewidth=5, zorder=10)plt.show()