在PHP中从pdf中提取文本并不适用于所有PDF文件

I am extracting text from PDF files. this is the code:

<?php

require("PdfToText.php");

$file   =  'SamplePF' ;
$pdf    =  new PdfToText ( "$file.pdf" ) ;
echo ( $pdf -> Text ) ;

?>

This class work fine for some PDF files. The problem with this class is :

for some PDF files it take text from random page/line not in the page sequence wise.
for some PDF files it is not showing any result.
for some PDF files it extract only one or two lines.

Please suggest some solution. Thank You!

写回答
好问题 0 提建议
追加酬金
关注问题
分享
邀请回答
编辑收藏删除结题
收藏举报

1条回答默认最新

关注

码龄粉丝数原力等级 --

被采纳

被点赞

采纳率
dotibrb048760 2016-12-02 06:20
关注
I am not sure that this might be the exact problem because of which you are not able to extract but I also encountered something similar when extracting data from pdf. Sometimes the PDF files are locked by owner passwords which puts certain restrictions on the document and does not allow changing, content copying or extraction etc so as to protect its copyright issues. Check this link for more info on owner passwords.

So you can first try to remove owner password and then try to extract such pdf's. To remove owner passwords there are a number of tools available online, you can choose whichever fits you the best.

解决无用
评论打赏
分享
举报

评论

按下Enter换行，Ctrl+Enter发表内容

报告相同问题？

关注问题

PHP - 从链接下载pdf文件并保存在本地文件夹中 php
2017-10-27 06:42

回答 3 已采纳 It is not clear what exactly you are doing and why your php script has to recognize pdf. If you a
如何使用PHP从PDF文件中提取文本？ php
2012-11-21 21:17

回答 1 已采纳 The Zend Framework provides Zend_Pdf, a php class that will load and parse pdf documents. Here
在PHP中获取PDF文件的内容 php
2016-04-23 10:27

回答 1 已采纳 You can use PDF Parser (PHP PDF Library) to extract each and everything from PDF's. PDF Parser Li
php 获取pdf中的文字,php – 如何从PDF文档中提取文本？
2021-04-12 18:50

weixin_39710179的博客下载class.pdf2text.php @http://pastebin.com/dvwySU1a(更新日期：2014年4月5日)或http://www.phpclasses.org/browse/file/31030.html(需要注册)码：include('class.pdf2text.php');$a = new PDF2Text();$a->...
Laravel - pdf文件的响应返回奇怪的文本而不是在浏览器中显示pdf jquery php
2018-08-31 10:25

回答 1 已采纳 My Workaround: Instead of using the local storage facility of laravel, I will use Drop Box to sto
使用php将mysql中的数据打印成pdf文件 mysql php
2018-02-17 18:02

回答 1 已采纳 This line $pdf->SetFont('','',12); sets your font to current font. But you didn't set any
如何从Android中的php URL下载PDF文件？ android java php
2015-05-21 17:44

回答 1 已采纳 getContentLength() uses the Content-Length header which, according to Google Chrome's Dev Tools, y
php提取pdf中的文字,如何提取pdf中的文字内容如何从pdf中提取文字
2021-04-22 15:26

weixin_39733232的博客很多人在编辑pdf文件时遇到过无法复制PDF中的文字而头疼不已。通常出现pdf无法复制文字的情况，除了加密的PDF文档(PDF文件中的文字存在两种可能性：其一，文字型PDF，可能是以计算机字符代码的形式被包裹在文件中；...
如何从php文件中获取多页pdf输出？ php
2015-05-11 10:30

回答 1 已采纳 The message : fatal error allowed memory size of bytes exhausted is telling you to increase the
确定PDF文件是否在PHP中具有可搜索的文本 php
2013-05-15 12:23

回答 1 已采纳 You could modify this code(pdf2text) to suit your purposes, I believe. Or this answer might get y
从PDF文件转换的图像质量不佳（PHP + Imagick） php
2017-09-04 12:55

回答 1 已采纳 From the documentation http://php.net/manual/en/imagick.setresolution.php: Imagick::setResolution
php pdftotext,如何在Symfony 3中使用PHP将PDF转换为文本(从PDF提取文本)
2021-04-22 17:41

腿太白的博客本文概述如果你使用可移植文档格式文件(PDF), 则系统用户可能希望从PDF文件中提取所有文本。因此, 用户不必用鼠标选择PDF的所有文本, 然后对其进行操作, 因为你可以在浏览器中使用JavaScript自动执行此操作。如果你...
在php中使用imageMagick进行Pdf预览 css html5 php
2017-06-23 15:02

回答 1 已采纳 In ImageMagick from PHP exec() exec("convert -density XX image.pdf[0] -resize YY% preview.png")
PHP 提取word与PDF文件文本信息
2019-12-29 00:26

assasinSteven的博客最近遇到了一个海南什么恶心的什么会议系统,其中恶心的需求就是:"xx,你把用户上传的个人简历文本信息提取出来呗,让用户一上传就能看见自己的简历信息,格式有doc,docx,还可能是PDF文件哦.......用什么方式实现不重要...
php图片生成加密pdf文件,如何在PHP中加密由TCPDF生成的PDF(密码保护)
2021-03-16 12:52

三山卡夫卡的博客本文概述PDF是数字世界中最接近正式文档的文件, 并且在现实生活中, 某些PDF旨在以保密的方式进行保密。而且, 如果你想使用TCPDF保护生成的PDF, 则库中内置了对密码保护和加密的支持, 你只需要学习如何使用它即可。...
没有解决我的问题, 去提问

悬赏问题

¥15 mmocr的训练错误，结果全为0
¥15 python的qt5界面
¥15 无线电能传输系统MATLAB仿真问题
¥50 如何用脚本实现输入法的热键设置
¥20 我想使用一些网络协议或者部分协议也行，主要想实现类似于traceroute的一定步长内的路由拓扑功能
¥30 深度学习，前后端连接
¥15 孟德尔随机化结果不一致
¥15 apm2.8飞控罗盘bad health，加速度计校准失败
¥15 求解O-S方程的特征值问题给出边界层布拉休斯平行流的中性曲线
¥15 谁有desed数据集呀

在PHP中从pdf中提取文本并不适用于所有PDF文件

1条回答 默认 最新

悬赏问题

1条回答默认最新