zhangswufe
zhangswufe
采纳率100%
2020-08-17 12:10 阅读 235

使用Python搜索一特定目录下所有文件中的关键词

我尝试使用Python搜索一特定目录下所有文件中的关键词。代码如下:

#!/usr/bin/python
#encoding:UTF-8

import os
import docx
from docx import *

#判断文件中是否包含关键字,是则将文件路径打印出来
def is_file_contain_word(file_list, query_word):
for _file in file_list:
if query_word in open(_file).read():
print (_file)
print("Finish searching.")

#返回指定目录的所有文件(包含子目录的文件)

def get_all_file(floder_path):
file_list = []
if floder_path is None:
raise Exception("floder_path is None")
for dirpath, dirnames, filenames in os.walk(floder_path):
for name in filenames:
file_list.append(dirpath + '\' + name)
return file_list

query_word = input("Please input the key word that you want to search:")

basedir = input("Please input the directory:")

is_file_contain_word(get_all_file(basedir), query_word)

input("Press Enter to quit.")

测试的目录为D:\test。内含一个word文档和一个子文件夹,子文件夹下有一个word文档。

输入关键词和目录后,得到如下信息:
Please input the key word that you want to search:'Shengaiwei'
Please input the directory:D:\test
Traceback (most recent call last):
File "C:\Users\c*\AppData\Local\Programs\Python\Python38\kword\kword7.py", line 29, in
is_file_contain_word(get_all_file(basedir), query_word)
File "C:\Users\c*\AppData\Local\Programs\Python\Python38\kword\kword7.py", line 11, in is_file_contain_word
if query_word in open(_file).read():
UnicodeDecodeError: 'gbk' codec can't decode byte 0xa2 in position 50: illegal multibyte sequence

烦请各位大侠帮助指导,谢谢!

  • 点赞
  • 写回答
  • 关注问题
  • 收藏
  • 复制链接分享

2条回答 默认 最新

  • 已采纳
    jingluan666 jingluan666 2020-08-17 15:21

    使用docx解析word文档

    def is_file_contain_word(file_list, query_word):
        for _file in file_list:
            extension = os.path.splitext(_file)[1].lower()
    
            if extension=='.docx' :
                doc = docx.Document(_file)
                for paragraph in doc.paragraphs:
                    if query_word in paragraph.text:
                        print(_file)
                        break
            else:
                content=open(_file).read()
                if query_word in content:
                    print (_file)
        print("Finish searching.")
    
    点赞 1 评论 复制链接分享
  • Return_Li い未亡程序猿! 2020-08-17 14:04

    你搜索的目录不能存在桌面快捷文件, 压缩包等不可解析的东西, 不然就报错, 代码没有规避

    点赞 评论 复制链接分享

相关推荐