ValueError: max_df corresponds to < documents than min_df

问题遇到的现象和发生背景

在跑LDA模型的时候报错，应该是在tf-idf向量化的时候报错的。

问题相关代码


n_features = 1000 #提取1000个特征词语
tf_vectorizer = CountVectorizer(strip_accents = 'unicode',
                                max_features=n_features,
                                stop_words='english',
                                max_df = 0.5,
                                min_df = 10)
tf = tf_vectorizer.fit_transform(data.content_cutted)

报错内容

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-11-ee1a3704afca> in <module>
      5                                 max_df = 0.5,
      6                                 min_df = 10)
----> 7 tf = tf_vectorizer.fit_transform(data.content_cutted)

D:\anaconda3\lib\site-packages\sklearn\feature_extraction\text.py in fit_transform(self, raw_documents, y)
   1216                              else min_df * n_doc)
   1217             if max_doc_count < min_doc_count:
-> 1218                 raise ValueError(
   1219                     "max_df corresponds to < documents than min_df")
   1220             if max_features is not None:

ValueError: max_df corresponds to < documents than min_df

写回答
好问题 0 提建议
追加酬金
关注问题
分享
邀请回答
编辑收藏删除结题
收藏举报

4条回答默认最新

关注

码龄粉丝数原力等级 --

被采纳

被点赞

采纳率
Chartte 2022-06-02 12:57
关注
我猜应该是样本量少了，max和min之间的区间相对就太大了，我把min_df改成2就能运行了

解决 1
无用
评论打赏
分享
举报

评论

按下Enter换行，Ctrl+Enter发表内容

报告相同问题？

关注问题

ValueError: num_samples should be a positive integer value, but got num_samples pycharm python 深度学习
2022-09-21 16:37

回答 2 已采纳 self.num_samples 必须是int类型而且必须大于0
ValueError: After pruning, no terms remain. Try a lower min_df or a higher max_df. python sklearn 人工智能有问必答
2022-04-25 18:47

回答 2 已采纳看看是不是这个问题删除在语料库Python中出现超过x％的单词-python黑洞网
ValueError: Unable to configure handler 'file_handler' python
2022-04-16 16:33

回答 1 已采纳 file_handler改成file
ValueError: check_hostname requires server_hostname的解决办法记录
2021-06-27 17:41

敤愛嘚皢嚟圖的博客在用pip安装scikit-image库时报错：ValueError: check_hostname requires server_hostname 报错ValueError: check_hostname requires server_hostname通常是因为版本冲突等原因，查遍网上众多大佬总结的经验后，...
ValueError: Expected input batch_size (1) to match target batch_size (0). python
2022-05-31 13:41

回答 1 已采纳可以参考解决ValueError: Expected input batch_size () to match target batch_size ().问题_小破船Z
ValueError: too many values to unpack (expected 2) python 深度学习
2022-09-09 09:46

回答 4 已采纳 eat_pool, feat_fc = net(input, input, test_mode[1])这段话的net函数的返回值给多了，看下net的return几个变量
ValueError: invalid color argument python
2022-05-07 16:20

回答 1 已采纳 color='while'是个鬼呀不是white吗
sklearn--CountVectorizer中的min_df和max_df
2021-04-03 17:42

BierOne的博客 max_df用于删除过于频繁出现的术语,也称为“语料库特定的停用词”.例如： max_df = 0.50表示“忽略出现在50％以上文档中的术语”. max_df = 25表示“忽略超过25个文档中出现的术语”. 默认的max_df是1.0,这意味着...
PYTHON深度学习：ValueError: zero-size array to reduction operation maximum which has no identity python 机器学习深度学习
2022-05-23 16:37

回答 2 已采纳看报错应该是训练集，数据为空了
使用python训练模型时报错：ValueError: The 'astra_cuda' `impl` is not found. python 深度学习
2022-07-06 19:08

回答 2 已采纳 'implementations.'.format(impl)，impl是啥，报错语句提示说没找到这个东西
python中出现“ ValueError: zero-size array to reduction operation maximum which has no identity”错误 python 有问必答
2021-08-31 16:30

回答 1 已采纳出错在第62行，重点检查flash_lat,flash_lon数组的值，是不是符合函数参数要求，在concatenate时，如果横向合并的话要加参数axis=1,另外再检查hexbin函数参数使用是否
Python遇到的坑--ValueError: check_hostname requires server_hostname
2021-04-24 00:00

小博测试成长之路的博客 //blog.csdn.net/liboshi123/", verify=False) 运行上面的代码的时候，发现报了下面的错误： raise ValueError("check_hostname requires server_hostname") ValueError: check_hostname requires server_hostname ...
ValueError: object __array__ method not producing an array解决
2021-09-23 19:37

正在吃饭的派大星的博客 ValueError: object array method not producing an array 关键是这个代码之前已经可以运行，环境我基本上没改，就一直报这个错误。我仔细检查了一下报错内容，发现是numpy的数组转向tensorflow的tensor时报错，...
ValueError: object __array__ method not producing an array
2021-04-21 20:37

qq_37344936的博客以下是我在学习《轻松学会TensorFlow2.0人工智能深度学习应用开发》中的一个案例，我用的版本是TensorFlow2.4.1 我的问题是在最后的回调函数一直会报valueerror类型的错误（具体信息贴在文章后面了）希望有大佬...
python遇到ValueError: check_hostname requires server_hostname解决方案
2021-04-18 19:52

阿正的梦工坊的博客文章目录遇到的问题解决方法参考遇到的问题 ValueError: check_hostname requires server_hostname 具体报错内容如下： Traceback (most recent call last): File "pythonrepos.py", line 6, in <module> r = ...
“ValueError: check_hostname requires server_hostname“ when trying to update conda
2021-04-20 16:54

爱听许嵩歌的博客 ValueError: check_hostname requires server_hostname 这个是科学上网网络端口问题。解决办法： 1、查看科学上网的端口号。 2、添加端口号在.condarc文件里。 Windows下查看.condarc在哪。 windows：C:\users\...
pip安装插件报错：ValueError: check_hostname requires server_hostname解决办法
2021-10-22 10:53

树叶上的风的博客输入pip3 install schedule后，报了ValueError: check_hostname requires server_hostname的问题，查了网上一些内容有说是版本问题，但本身不知道这个插件是否有版本区分就卡在这里，最后看一位博主说需要关闭VPN(划...
报错ValueError: max() arg is an empty sequence 或ValueError: min() arg is an empty sequencmin()
2021-11-18 14:50

Glacier1031的博客 ValueError: max() arg is an empty sequence 或ValueError: min() arg is an empty sequencmin() Python2的map函数返回的是一个list，而Python3中的map返回的是map的地址，Python3中的max()或min()传入一个地址就...
GBDT调参时遇到的ValueError: max_features must be in (0, n_features]问题
2020-04-01 15:15

郑少女的博客 GBDT对max_features调参时 ‘max_features’:range...ValueError: max_features must be in (0, n_features] 是因为： If float, then max_features is a percentage and int(max_features * n_features) features ...
没有解决我的问题, 去提问

问题事件

关注

码龄粉丝数原力等级 --

被采纳

被点赞

采纳率
创建了问题 4月20日

悬赏问题

¥20 公众号如何实现点击超链接后自动发送文字
¥15 用php隐藏类名和增加类名
¥15 算法设计与分析课程的提问
¥15 用MATLAB汇总拟合图
¥15 智能除草机器人方案设计
¥15 对接wps协作接口实现消息发送
¥15 SQLite 出现“Database is locked” 如何解决？
¥15 已经加了学校的隶属邮箱了，为什么还是进不去github education？😭
¥15 求会做聚类，TCN的朋友有偿线上指导。以下是目前遇到的问题
¥100 无网格伽辽金方法研究裂纹扩展的程序

ValueError: max_df corresponds to < documents than min_df

问题遇到的现象和发生背景

问题相关代码

报错内容

4条回答 默认 最新

问题事件

悬赏问题

4条回答默认最新