仿照hamlet.txt的方式，按照分词方式（jieba分词）对threeking.doms.txt进行词频统计，并输出词频最高的20个词及词频

如图所示，程序内提取附件是with open（'threekingdoms.txt'，'r'，encoding＝'utf-8'）as f :
print（f.read（））

写回答
好问题 0 提建议
追加酬金
关注问题
分享
邀请回答
编辑收藏删除
收藏举报

1条回答默认最新

沐沐不是沫 2022-05-13 16:08

关注

说明：代码是使用我自己的数据进行实现的，使用时要把你数据文件的路径传给变量dic_path
请采纳，谢谢！
（1）
代码如下：

import jieba

dic_path = './10.txt' # 文件的路径
with open(dic_path, 'r', encoding='utf8') as f:
    txt = f.read()
print(txt)

words=jieba.lcut(txt)
counts={}

for word in words:
    if len(word) == 1: #排除单个字符的分词结果
        continue
    else:
        counts[word] = counts.get(word,0) + 1  # 统计词频
        items = list(counts.items())
        
items.sort(key=lambda x:x[1], reverse=True) # 按词频进行降序
for i in range(20):  # 打印词频前20 的词
    word, count=items[i]
    print("{0:<10}{1:>5}".format(word, count))

结果：

（2）代码如下：

import jieba

dic_path = './10.txt'
with open(dic_path, 'r', encoding='utf8') as f:
    txt = f.read()
print(txt)

words=jieba.lcut(txt)
counts={}

# fu和text分别存储标点符号和转义字符，若统计词频出现时可以在这里添加来洗去
fu = '[·’!"\#$%&\'()＃！（）*+,-./:;<=>?%％^@！\@，：?￥★、—_…．＞【】［］《》？“”‘’\[\\]^_`{|}~]+。'
text = ['\u3000','\n']

for word in words:
    if len(word) != 1: #排除单个字符的分词结果
        continue
    elif word in fu:  # 去除标点符号
        continue
    elif word in text:  # 去转义字符
        continue
    else:
        counts[word] = counts.get(word,0) + 1  # 统计词频
        items = list(counts.items())
        
items.sort(key=lambda x:x[1], reverse=True) # 按词频进行降序
# print(items)
for i in range(20):
    word, count=items[i]
    print("{0:<10}{1:>5}".format(word, count))

结果如下：

（3）代码如下：

import jieba

dic_path = './10.txt'
with open(dic_path, 'r', encoding='utf8') as f:
    txt = f.read()
print(txt)

words=jieba.lcut(txt)
counts={}

for word in words:
    if len(word) != 2: #排除单个字符的分词结果
        continue
    else:
        counts[word] = counts.get(word,0) + 1  # 统计词频
        items = list(counts.items())
        
items.sort(key=lambda x:x[1], reverse=True) # 按词频进行降序
for i in range(20):
    word, count=items[i]
    print("{0:<10}{1:>5}".format(word, count))

结果：

本回答被题主选为最佳回答 , 对您是否有帮助呢?

报告相同问题？

关注问题

仿照hamlet.txt的方式，按照分词方式（jieba分词）对threeking.doms.txt进行词频统计，并输出词频最高的20个词及词频 python
2022-05-12 23:20

回答 1 已采纳说明：代码是使用我自己的数据进行实现的，使用时要把你数据文件的路径传给变量dic_path请采纳，谢谢！（1）代码如下： import jieba dic_path = './10.txt' # 文
python用jieba库进行哈姆雷特词频统计 python
2022-06-23 10:48

回答 2 已采纳找不到文件，你现在应该使用的是相对路径的写法，保证txt文件和运行的脚本文件在同一目录下。文件名也用复制的方式，避免出错。有帮助请采纳，还有不懂的可以继续追问~
jieba库，用不了，一直报错什么原因，求解答 python 有问必答
2021-05-18 18:47

回答 2 已采纳你的hamlet.txt没在当前文件夹路径下，你可以把完整的路径补充上例如 getText txt = open("C:/Users/Lenovo/Desktop/hamlet.txt", "r"
Python 合并多个TXT文件并统计词频的实现
2020-09-18 18:29

主要介绍了Python 合并多个TXT文件并统计词频的实现，文中通过示例代码介绍的非常详细，对大家的学习或者工作具有一定的参考学习价值，需要的朋友们下面随着小编来一起学习学习吧
统计若干个大型英文txt文件中所有单词出现的次数，并输出出现次数最多的前10个单词及其出现次数
2016-05-30 12:45

回答 1 已采纳 http://www.lxway.com/4084220604.htm
python统计单词词频 python 有问必答
2021-05-16 10:44

回答 4 已采纳 def getText(): txt = open("C:/Users/Lenovo/Desktop/hamlet.txt", "r").read() txt = txt.lowe
统计英文文本的字符频率，对同一文本，两种代码的结果不同，为何？(语言-python) python
2022-05-14 15:48

回答 1 已采纳很遗憾，第一和第三都是错的。。。因为 alpha.lower()返回一个新的字符串，但是没有变量接收，所以alpha本身没有改变，还是存在大写字母。
python123词频统计之哈姆雷特_hamlet.txt英文哈姆雷特下载及实现文本词频统计
2021-03-17 12:27

Crazy anti的博客 CalHamletV1.py：#CalHamletV1.pydef getText():txt = open("hamlet.txt", "r").read()txt = txt.lower()for ch in '!"#$%&()*+,-./:;<=>?@[\\]^_`{|}~':txt = txt.replace(ch, " ")return txthamletTxt =...
Hamlet's gambling
2017-09-17 08:45

回答 1 已采纳 https://wenku.baidu.com/view/bf21d6c56037ee06eff9aef8941ea76e58fa4a60.html
《五个python基础的问题》（学校布置的zy，只剩这几个不会了，来人看一下） python
2022-11-17 17:52

回答 2 已采纳第一题： try: dic = eval(input()) dic_s = {value: key for key, value in dic.items()} print(d
The Top Shelf
2016-12-29 15:54

回答 1 已采纳 http://blog.csdn.net/xc19952007/article/details/50589628
Hamlet.txt下载及实现文本词频统计
2020-03-03 10:52

Z.Top的博客 Hamlet.txt全文下载：https://python123.io/resources/pye/hamlet.txt ———————————————————————————————————— 文本词频统计代码①如下： # CalHamlet_1.py def getText(): txt...
PHP - 打印数据库整个表 mysql php
2015-06-15 13:25

回答 2 已采纳 Try mysqli_fetch_all(), it returns all your query records. Tip: in general you only want an assoc
python学习文本词频统计，hamlet.txt，三国演义.txt
2021-04-10 22:50

ynrainy的博客 python学习文本词频统计，hamlet.txt，三国演义.txt，英文文章使用split(),中文文章使用jieba.lcut()。 hamlet.txt，三国演义.txt文本的下载地址：https://download.csdn.net/download/tommycsdn/16607738 ...
python学习文本词频统计hamlet三国演义
2021-04-10 21:04

python学习文本词频统计hamlet.txt三国演义.txt
hamlet.txt英文哈姆雷特下载及实现文本词频统计
2020-05-24 17:20

依神女苑的博客 hamlet.txt全文下载：https://python123.io/resources/pye/hamlet.txt CalHamletV1.py： #CalHamletV1.py def getText(): txt = open("hamlet.txt", "r").read() txt = txt.lower() for ch in '!"#$%&()*+,-....
Python 分词，词频统计，寻找公共词
2016-12-12 15:01

Python 分词，词频统计，寻找公共词
Python之词频统计
2023-04-20 19:23

《三国演义》人物出场统计、HAMLET词频统计
hamlet.zip_Hamlet
2022-07-13 21:04

把一篇文章中的点，空格去掉之后，并按照字母顺序进行排序的代码
python 读取TXT 文档进行词频统计
2018-09-13 20:37

SongpingWang的博客 'hamlet.txt' , 'r' ). read () txt=txt. lower () for ch in "~@#$%^&*()_-+=<>?/,.:;{}[]|\'" "" : txt=txt. replace (ch, ' ' ) return txt hamletTxt=getText() words =hamletTxt. split () ...
没有解决我的问题, 去提问

问题事件

关注

码龄粉丝数原力等级 --

被采纳

被点赞

采纳率
系统已结题 6月14日
关注

码龄粉丝数原力等级 --

被采纳

被点赞

采纳率
已采纳回答 6月6日
关注

码龄粉丝数原力等级 --

被采纳

被点赞

采纳率
创建了问题 5月12日

悬赏问题

¥15 2020长安杯与连接网探
¥15 关于#matlab#的问题：在模糊控制器中选出线路信息，在simulink中根据线路信息生成速度时间目标曲线（初速度为20m/s，15秒后减为0的速度时间图像）我想问线路信息是什么
¥15 banner广告展示设置多少时间不怎么会消耗用户价值
¥16 mybatis的代理对象无法通过@Autowired装填
¥15 可见光定位matlab仿真
¥15 arduino 四自由度机械臂
¥15 wordpress 产品图片 GIF 没法显示
¥15 求三国群英传pl国战时间的修改方法
¥15 matlab代码代写，需写出详细代码，代价私
¥15 ROS系统搭建请教（跨境电商用途）

仿照hamlet.txt的方式，按照分词方式（jieba分词）对threeking.doms.txt进行词频统计，并输出词频最高的20个词及词频

1条回答 默认 最新

问题事件

悬赏问题

1条回答默认最新