duanpu1111 2013-05-15 12:23
浏览 105
已采纳

确定PDF文件是否在PHP中具有可搜索的文本

We have hundreds of PDF files on a server. Some of them contain searchable text and others do not.

I was asked to find out which are searchable and which are not.

Does anybody know of a way to read in a bunch of PDFs and determine if that PDF document contains text that is searchable/selectable or if the pdf only contains non-selectable/searchable text which needs to be OCRd?

I don't even need to actually read in the text; I just need to be able to detect possibly by tags or keywords, something that suggests that there are fonts or something like that in the raw data.

Are there tags in a searchable PDF that make it easy to detect?

Thanks

  • 写回答

1条回答 默认 最新

  • dpowhyh70416 2013-05-15 14:55
    关注

    You could modify this code(pdf2text) to suit your purposes, I believe. Or this answer might get you to the right spot as well.

    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论

报告相同问题?

悬赏问题

  • ¥40 复杂的限制性的商函数处理
  • ¥15 程序不包含适用于入口点的静态Main方法
  • ¥15 素材场景中光线烘焙后灯光失效
  • ¥15 请教一下各位,为什么我这个没有实现模拟点击
  • ¥15 执行 virtuoso 命令后,界面没有,cadence 启动不起来
  • ¥50 comfyui下连接animatediff节点生成视频质量非常差的原因
  • ¥20 有关区间dp的问题求解
  • ¥15 多电路系统共用电源的串扰问题
  • ¥15 slam rangenet++配置
  • ¥15 有没有研究水声通信方面的帮我改俩matlab代码