问题遇到的现象和发生背景
在跑LDA模型的时候报错,应该是在tf-idf向量化的时候报错的。
问题相关代码
n_features = 1000 #提取1000个特征词语
tf_vectorizer = CountVectorizer(strip_accents = 'unicode',
max_features=n_features,
stop_words='english',
max_df = 0.5,
min_df = 10)
tf = tf_vectorizer.fit_transform(data.content_cutted)
报错内容
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-11-ee1a3704afca> in <module>
5 max_df = 0.5,
6 min_df = 10)
----> 7 tf = tf_vectorizer.fit_transform(data.content_cutted)
D:\anaconda3\lib\site-packages\sklearn\feature_extraction\text.py in fit_transform(self, raw_documents, y)
1216 else min_df * n_doc)
1217 if max_doc_count < min_doc_count:
-> 1218 raise ValueError(
1219 "max_df corresponds to < documents than min_df")
1220 if max_features is not None:
ValueError: max_df corresponds to < documents than min_df