如题,请问如何用python统计本地html文件中每个单词出现的次数?尝试过with open() 配合read(),但效果不好,统计出的结果少于单词实际出现次数
我当时是这样统计的:
freq_count = {} # store the count of each word
with open(file_path, 'r+', encoding='utf-8') as document:
tokens = document.read().split()
for token in tokens:
token = re.sub(r'\W+', '', token)
token = token.lower()
# update dict
if token not in freq_count:
freq_count[token] = 1
else:
freq_count[token] += 1
要统计字数的文件类似于:https://en.wikipedia.org/wiki/Social_intelligence
已存为本地HTML文件