ZSQ2333 2022-04-19 20:04 采纳率: 0%
浏览 186

sklearn如何解决报错The truth value of a DataFrame is ambiguous.

问题相关代码,请勿粘贴截图
import os
import jieba
import pandas as pd
import numpy as np
from sklearn.feature_extraction.text import TfidfVectorizer

output_dir = r'output'
if not os.path.exists(output_dir):
    os.mkdir(output_dir)
# 查看训练数据
train_data = pd.read_csv('data/classify_train.csv', encoding='gbk')
print(train_data.head())
# 载入停用词
stopwords = pd.read_csv("data/stopwords.txt", index_col=False, sep="\t", quoting=3, names=['stopword'], encoding='utf-8')
# stopwords=set()
# with open('data/stopwords.txt','r') as infile:
#     for line in infile:
#         line = line.rstrip('\n')
#         if line:
#             stopwords.add(line.lower())
# min_df去掉df值小的词(这样的词一般是非常专业的名词或则是生僻词,是噪音)max_df是去掉df很大的词,这样的词是常用词去掉不要
tfidf = TfidfVectorizer(tokenizer=jieba.lcut, stop_words=stopwords, min_df=50, max_df=0.3)
# 编码x变量
x=tfidf.fit_transform(train_data[u'内容'])

train_data[u'内容']:

img

运行结果及报错内容

报如下错误

Traceback (most recent call last):
  File "D:/PyCharm/flaskProject/BOW.py", line 36, in <module>
    x=tfidf.fit_transform(train_data[u'内容'])
  File "D:\PyCharm\flaskProject\venv\lib\site-packages\sklearn\feature_extraction\text.py", line 2077, in fit_transform
    X = super().fit_transform(raw_documents)
  File "D:\PyCharm\flaskProject\venv\lib\site-packages\sklearn\feature_extraction\text.py", line 1330, in fit_transform
    vocabulary, X = self._count_vocab(raw_documents, self.fixed_vocabulary_)
  File "D:\PyCharm\flaskProject\venv\lib\site-packages\sklearn\feature_extraction\text.py", line 1193, in _count_vocab
    analyze = self.build_analyzer()
  File "D:\PyCharm\flaskProject\venv\lib\site-packages\sklearn\feature_extraction\text.py", line 446, in build_analyzer
    stop_words = self.get_stop_words()
  File "D:\PyCharm\flaskProject\venv\lib\site-packages\sklearn\feature_extraction\text.py", line 368, in get_stop_words
    return _check_stop_list(self.stop_words)
  File "D:\PyCharm\flaskProject\venv\lib\site-packages\sklearn\feature_extraction\text.py", line 185, in _check_stop_list
    if stop == "english":
  File "D:\PyCharm\flaskProject\venv\lib\site-packages\pandas\core\generic.py", line 1527, in __nonzero__
    raise ValueError(
ValueError: The truth value of a DataFrame is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

我的解答思路和尝试过的方法

尝试过查看是否是空值,去掉空值后发现仍然不对

  • 写回答

1条回答 默认 最新

  • 不会长胖的斜杠 后端领域新星创作者 2022-04-19 20:08
    关注

    后面加上.item()试试

    评论 编辑记录

报告相同问题?

问题事件

  • 创建了问题 4月19日

悬赏问题

  • ¥88 error: [polling_error] {"code":"ETELEGRAM","message":"ETELEGRAM: 502 Bad Gateway"}错误
  • ¥15 基于matab语言描述表示泥浆密度沿着管路的长度方向在不断变化根据上述描述表示泥浆密度沿着管路的长度方向在不断变化,如何来表示泥浆密度随管路流速的变化
  • ¥15 刚毕业,刚通过一家PLC工程师,请问一下待遇还算可以吗?
  • ¥15 Spring Boot
  • ¥15 一个小程序关于简单的增删改查
  • ¥15 公司内网,想基本不写代码挂一些视频,有一个还不错的前端展示,有什么软件或者框架可以用吗?尽量简单
  • ¥15 appium自动化测试
  • ¥30 java怎么解析CanFD的16进制数据
  • ¥15 广义可加模型和光滑曲线拟合的R代码
  • ¥15 关于C#多个文本框输入的问题