bwang_single 2019-10-16 14:28 采纳率: 0%
浏览 563

solr导入大量的抓取的来的PDF,PPT,doc,docx文件报错

Full Import failed:java.lang.RuntimeException: java.lang.RuntimeException: org.apache.solr.handler.dataimport.DataImportHandlerException: Unable to read content Processing Document # 305
at org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:271)
at org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:424)
at org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:483)
at org.apache.solr.handler.dataimport.DataImporter.lambda$runAsync$0(DataImporter.java:466)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.RuntimeException: org.apache.solr.handler.dataimport.DataImportHandlerException: Unable to read content Processing Document # 305
at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:417)
at org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:330)
at org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:233)
... 4 more
Caused by: org.apache.solr.handler.dataimport.DataImportHandlerException: Unable to read content Processing Document # 305
at org.apache.solr.handler.dataimport.DataImportHandlerException.wrapAndThrow(DataImportHandlerException.java:69)
at org.apache.solr.handler.dataimport.TikaEntityProcessor.nextRow(TikaEntityProcessor.java:171)
at org.apache.solr.handler.dataimport.EntityProcessorWrapper.nextRow(EntityProcessorWrapper.java:267)
at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:476)
at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:517)
at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:415)
... 6 more
Caused by: org.apache.tika.exception.TikaException: TIKA-198: Illegal IOException from org.apache.tika.parser.pdf.PDFParser@803fe60
at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:286)
at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)
at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:143)
at org.apache.solr.handler.dataimport.TikaEntityProcessor.nextRow(TikaEntityProcessor.java:165)
... 10 more
Caused by: java.io.IOException: javax.crypto.IllegalBlockSizeException: Input length must be multiple of 16 when decrypting with padded cipher
at org.apache.pdfbox.pdmodel.encryption.SecurityHandler.encryptDataAESother(SecurityHandler.java:292)
at org.apache.pdfbox.pdmodel.encryption.SecurityHandler.encryptData(SecurityHandler.java:153)
at org.apache.pdfbox.pdmodel.encryption.SecurityHandler.decryptStream(SecurityHandler.java:454)
at org.apache.pdfbox.pdfparser.COSParser.parseFileObject(COSParser.java:917)
at org.apache.pdfbox.pdfparser.COSParser.parseObjectDynamically(COSParser.java:874)
at org.apache.pdfbox.pdfparser.COSParser.parseObjectDynamically(COSParser.java:794)
at org.apache.pdfbox.pdfparser.COSParser.parseDictObjects(COSParser.java:754)
at org.apache.pdfbox.pdfparser.PDFParser.initialParse(PDFParser.java:185)
at org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:220)
at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1160)
at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1133)
at org.apache.tika.parser.pdf.PDFParser.parse(PDFParser.java:154)
at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)
... 13 more
Caused by: javax.crypto.IllegalBlockSizeException: Input length must be multiple of 16 when decrypting with padded cipher
at com.sun.crypto.provider.CipherCore.prepareInputBuffer(CipherCore.java:1005)
at com.sun.crypto.provider.CipherCore.doFinal(CipherCore.java:848)
at com.sun.crypto.provider.AESCipher.engineDoFinal(AESCipher.java:446)
at javax.crypto.Cipher.doFinal(Cipher.java:2047)
at org.apache.pdfbox.pdmodel.encryption.SecurityHandler.encryptDataAESother(SecurityHandler.java:276)

  • 写回答

1条回答 默认 最新

  • threenewbee 2019-10-16 20:09
    关注

    dataimport.DataImportHandlerException: Unable to read content Processing Document # 305
    是不是有文件损坏,格式不正常,无法读取内容

    评论

报告相同问题?

悬赏问题

  • ¥15 c程序不知道为什么得不到结果
  • ¥40 复杂的限制性的商函数处理
  • ¥15 程序不包含适用于入口点的静态Main方法
  • ¥15 素材场景中光线烘焙后灯光失效
  • ¥15 请教一下各位,为什么我这个没有实现模拟点击
  • ¥15 执行 virtuoso 命令后,界面没有,cadence 启动不起来
  • ¥50 comfyui下连接animatediff节点生成视频质量非常差的原因
  • ¥20 有关区间dp的问题求解
  • ¥15 多电路系统共用电源的串扰问题
  • ¥15 slam rangenet++配置