可以用for循环写成嵌套列表进行处理,示例如下,获取按行去除停用词的分词结果,并可以直接写入csv或者excel中:
from nltk.corpus import stopwords
from nltk.tokenize import word_tokenize
import pandas as pd
example_sent = """This is a sample sentence,showing off the stop words filtration.\n Hello guys!"""
stop_words = set(stopwords.words('english'))
word_tokens = [word_tokenize(x) for x in example_sent.split('\n')]
filtered_sentence = []
for wd in word_tokens:
cent=[]
for w in wd:
if w not in stop_words:
cent.append(w)
filtered_sentence.append(cent)
print(word_tokens)
print(filtered_sentence)
df=pd.DataFrame(filtered_sentence)
print(df)
运行结果:
0 1 2 3 4 5 6 7 8
0 This sample sentence , showing stop words filtration .
1 Hello guys ! None None None None None None
如有帮助和启发,请点采纳。