doumi5223 2016-11-19 06:01 采纳率: 100%
浏览 105

在PHP中从pdf中提取文本并不适用于所有PDF文件

I am extracting text from PDF files. this is the code:

<?php

require("PdfToText.php");

$file   =  'SamplePF' ;
$pdf    =  new PdfToText ( "$file.pdf" ) ;
echo ( $pdf -> Text ) ;

?>

This class work fine for some PDF files. The problem with this class is :

  1. for some PDF files it take text from random page/line not in the page sequence wise.
  2. for some PDF files it is not showing any result.
  3. for some PDF files it extract only one or two lines.

Please suggest some solution. Thank You!

  • 写回答

1条回答 默认 最新

  • dotibrb048760 2016-12-02 06:20
    关注

    I am not sure that this might be the exact problem because of which you are not able to extract but I also encountered something similar when extracting data from pdf. Sometimes the PDF files are locked by owner passwords which puts certain restrictions on the document and does not allow changing, content copying or extraction etc so as to protect its copyright issues. Check this link for more info on owner passwords.

    So you can first try to remove owner password and then try to extract such pdf's. To remove owner passwords there are a number of tools available online, you can choose whichever fits you the best.

    评论

报告相同问题?

悬赏问题

  • ¥15 mmocr的训练错误,结果全为0
  • ¥15 python的qt5界面
  • ¥15 无线电能传输系统MATLAB仿真问题
  • ¥50 如何用脚本实现输入法的热键设置
  • ¥20 我想使用一些网络协议或者部分协议也行,主要想实现类似于traceroute的一定步长内的路由拓扑功能
  • ¥30 深度学习,前后端连接
  • ¥15 孟德尔随机化结果不一致
  • ¥15 apm2.8飞控罗盘bad health,加速度计校准失败
  • ¥15 求解O-S方程的特征值问题给出边界层布拉休斯平行流的中性曲线
  • ¥15 谁有desed数据集呀