Drink VC 2021-04-21 14:53 采纳率: 0%
浏览 39

Python练习题:词频统计

def getText():
    #获取文件
    text = open("Walden.txt","r").read()
    text = text.lower()
    for ch in '!"#$%&()*+,-./:;<=>?@[\\]^_‘{|}~' :
        text = text.replace(ch," ")
    return  text

txt = getText()
words = txt.split()
counts = {}               #定义空字典
for word in words:
    counts[word] = counts.get(word,0) + 1

items = list(counts.items())
#将列表按照count中键值从大到小的顺序进行排列
items.sort(key=lambda x:x[1],reverse=True)

#输出结果,前二十位最高频词汇
for i in range(10):
    word,count = items[i]
    print("#{0:<10}{1:>5}".format(word,counts))

词频统计代码如上,但是出现报错,请问这是什么原因啊?

  • 写回答

4条回答 默认 最新

  • CSDN专家-HGJ 2021-04-21 14:58
    关注

    打开文件语句中加入编码格式参数,改为text = open("Walden.txt","r",encoding='utf-8‘).read(),试一下看看。另代码末尾打印语句中也有个小错误,是count而不是counts

    评论

报告相同问题?