大家好,我目前正在做删除停词的任务,这个代码可以运行,想请教如何改成循环语句,即循环提取文件夹内停词,而不是单个单个的文件。应该是改“file1....这个语句,但不知道如何改。谢谢大家!
import io
from nltk.corpus import stopwords
from nltk.tokenize import word_tokenize
stop_words = set(stopwords.words('english'))
file1 = open(
r"D:\1.1 SEC EDGAR年报源文件 (10Q_10KA_10QA)\2001\QTR1\20010102_10-K-A_edgar_data_1024302_0001092388-00-500453.txt")
line = file1.read()
words = word_tokenize(line)
words_witout_stop_words = ["" if word in stop_words else word for word in words]
new_words = " ".join(words_witout_stop_words).strip()
appendFile = open(
r"D:\1.1 SEC EDGAR年报源文件 (10Q_10KA_10QA)\2001\QTR1\20010102_10-K-A_edgar_data_1024302_0001092388-00-500453.txt", 'w')
appendFile.write(new_words)
appendFile.close()
谢谢大家!