solr导入大量的抓取的来的PDF,PPT,doc,docx文件报错

Full Import failed:java.lang.RuntimeException: java.lang.RuntimeException: org.apache.solr.handler.dataimport.DataImportHandlerException: Unable to read content Processing Document # 305
at org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:271)
at org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:424)
at org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:483)
at org.apache.solr.handler.dataimport.DataImporter.lambda$runAsync$0(DataImporter.java:466)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.RuntimeException: org.apache.solr.handler.dataimport.DataImportHandlerException: Unable to read content Processing Document # 305
at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:417)
at org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:330)
at org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:233)
... 4 more
Caused by: org.apache.solr.handler.dataimport.DataImportHandlerException: Unable to read content Processing Document # 305
at org.apache.solr.handler.dataimport.DataImportHandlerException.wrapAndThrow(DataImportHandlerException.java:69)
at org.apache.solr.handler.dataimport.TikaEntityProcessor.nextRow(TikaEntityProcessor.java:171)
at org.apache.solr.handler.dataimport.EntityProcessorWrapper.nextRow(EntityProcessorWrapper.java:267)
at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:476)
at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:517)
at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:415)
... 6 more
Caused by: org.apache.tika.exception.TikaException: TIKA-198: Illegal IOException from org.apache.tika.parser.pdf.PDFParser@803fe60
at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:286)
at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)
at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:143)
at org.apache.solr.handler.dataimport.TikaEntityProcessor.nextRow(TikaEntityProcessor.java:165)
... 10 more
Caused by: java.io.IOException: javax.crypto.IllegalBlockSizeException: Input length must be multiple of 16 when decrypting with padded cipher
at org.apache.pdfbox.pdmodel.encryption.SecurityHandler.encryptDataAESother(SecurityHandler.java:292)
at org.apache.pdfbox.pdmodel.encryption.SecurityHandler.encryptData(SecurityHandler.java:153)
at org.apache.pdfbox.pdmodel.encryption.SecurityHandler.decryptStream(SecurityHandler.java:454)
at org.apache.pdfbox.pdfparser.COSParser.parseFileObject(COSParser.java:917)
at org.apache.pdfbox.pdfparser.COSParser.parseObjectDynamically(COSParser.java:874)
at org.apache.pdfbox.pdfparser.COSParser.parseObjectDynamically(COSParser.java:794)
at org.apache.pdfbox.pdfparser.COSParser.parseDictObjects(COSParser.java:754)
at org.apache.pdfbox.pdfparser.PDFParser.initialParse(PDFParser.java:185)
at org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:220)
at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1160)
at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1133)
at org.apache.tika.parser.pdf.PDFParser.parse(PDFParser.java:154)
at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)
... 13 more
Caused by: javax.crypto.IllegalBlockSizeException: Input length must be multiple of 16 when decrypting with padded cipher
at com.sun.crypto.provider.CipherCore.prepareInputBuffer(CipherCore.java:1005)
at com.sun.crypto.provider.CipherCore.doFinal(CipherCore.java:848)
at com.sun.crypto.provider.AESCipher.engineDoFinal(AESCipher.java:446)
at javax.crypto.Cipher.doFinal(Cipher.java:2047)
at org.apache.pdfbox.pdmodel.encryption.SecurityHandler.encryptDataAESother(SecurityHandler.java:276)

1个回答

dataimport.DataImportHandlerException: Unable to read content Processing Document # 305
是不是有文件损坏,格式不正常,无法读取内容

bwang_single
bwang_single 文件比较多,排查不了也无法确定是哪个有问题,所以导入文件的时候有没有办法一个出错继续下一个呢?
8 个月之前 回复
Csdn user default icon
上传中...
上传图片
插入图片
抄袭、复制答案,以达到刷声望分或其他目的的行为,在CSDN问答是严格禁止的,一经发现立刻封号。是时候展现真正的技术了!
立即提问
相关内容推荐