问题遇到的现象和发生背景
用Jupyter学习纽约出租车运行情况分析建模时,在聚类的时候,运行下列代码出现MemoryError报错。
问题相关代码
kmeans = KMeans(n_clusters=15, random_state=2, n_init = 10).fit(loc_df)
loc_df['label'] = kmeans.labels_
loc_df = loc_df.sample(200000)
plt.figure(figsize = (10,10))
for label in loc_df.label.unique():
plt.plot(loc_df.longitude[loc_df.label == label],loc_df.latitude[loc_df.label == label],'.',alpha = 0.3, markersize = 0.3)
plt.title('NewYork Clusters')
plt.show()
运行结果及报错内容
---------------------------------------------------------------------------
MemoryError Traceback (most recent call last)
<ipython-input-16-06a5f4870f57> in <module>()
----> 1 kmeans = KMeans(n_clusters=15, random_state=2, n_init = 10).fit(loc_df)
2 loc_df['label'] = kmeans.labels_
3
4 loc_df = loc_df.sample(200000)
5 plt.figure(figsize = (10,10))
D:\Anaconda3\lib\site-packages\sklearn\cluster\k_means_.py in fit(self, X, y)
894 tol=self.tol, random_state=random_state, copy_x=self.copy_x,
895 n_jobs=self.n_jobs, algorithm=self.algorithm,
--> 896 return_n_iter=True)
897 return self
898
D:\Anaconda3\lib\site-packages\sklearn\cluster\k_means_.py in k_means(X, n_clusters, init, precompute_distances, n_init, max_iter, verbose, tol, random_state, copy_x, n_jobs, algorithm, return_n_iter)
344 X, n_clusters, max_iter=max_iter, init=init, verbose=verbose,
345 precompute_distances=precompute_distances, tol=tol,
--> 346 x_squared_norms=x_squared_norms, random_state=random_state)
347 # determine if these results are the best so far
348 if best_inertia is None or inertia < best_inertia:
D:\Anaconda3\lib\site-packages\sklearn\cluster\k_means_.py in _kmeans_single_elkan(X, n_clusters, max_iter, init, verbose, x_squared_norms, random_state, tol, precompute_distances)
398 print('Initialization complete')
399 centers, labels, n_iter = k_means_elkan(X, n_clusters, centers, tol=tol,
--> 400 max_iter=max_iter, verbose=verbose)
401 inertia = np.sum((X - centers[labels]) ** 2, dtype=np.float64)
402 return labels, inertia, centers, n_iter
sklearn\cluster\_k_means_elkan.pyx in sklearn.cluster._k_means_elkan.k_means_elkan()
MemoryError:
请问这个怎么解决吖
读取文件用的是
df = pd.read_csv('yellow_tripdata_2012-01.csv')