问题遇到的现象和发生背景
遍历文件夹,输出包含“媒体质疑”的文件,输出名称
中途报错跳过几个,最后出现大量报错停止
问题相关代码,请勿粘贴截图
import os
import openpyxl
import fitz
import re
path=r'D:\下载\数据3 - 副本'
os.chdir(path)
my = openpyxl.Workbook()
mywb=my["Sheet"]
row=0
count=0
for d1 in os.listdir():
check1=d1+r'\信息披露\注册稿'
check2=d1+r'\问询与回复'
tmpcount=0
for file in os.listdir(check1):
doc = fitz.open(check1+'\\'+file)
for page in doc:
text = page.get_text()
if text.find('媒体质疑')!=-1:
tmpcount = tmpcount + 1
print(check1+'\\'+file)
for file in os.listdir(check2):
doc = fitz.open(check2+'\\'+file)
for page in doc:
text = page.get_text()
if text.find('媒体质疑') != -1:
tmpcount = tmpcount + 1
print(check2 + '\\' + file)
row = row + 1
mywb.cell(row, 1, d1)
if tmpcount>=1:
count=count+1
mywb.cell(row, 2, 1)
else:
mywb.cell(row, 2, 0)
print("被质疑的公司一共有",count,"家")
my.save(r'C:\Users\huang\Desktop'+"\\媒体质疑.xlsx")
运行结果及报错内容
mupdf: xref generation number missing
mupdf: expected object number
mupdf: cannot find startxref
mupdf: object out of range (1358 0 R); xref size 1353
mupdf: object is not a stream
mupdf: invalid ICC colorspace
mupdf: realloc (257379 bytes) failed
mupdf: malloc of 332758 bytes failed
RuntimeError: malloc of 115172 bytes failed
我的解答思路和尝试过的方法
不明所以
我想要达到的结果
所有文件全部成功读取