dongxun3424 2011-09-16 20:16
浏览 72
已采纳

如何在不破坏文件的情况下“手动”编辑pdf中的注释?

I need to insert an hyperlink into a few thousand existing pdfs. I'm working with zend_pdf which apparently is not able to set an invisible border. The only way I found to make the link borders invisible (found it somewhere else on this site, here, to be precise) is to search for each link "element" of the pdf and add a /Border annotation, like this:

echo str_replace('/Annot /Subtype /Link', '/Annot /Subtype /Link /Border[0 0 0]', $pdf->render());

Since I need to work on files that reside on my filesystem, I'm using the sed command for the search & replace operation.
Now, at first sight this works, as the documents are displayed correctly by Acrobat 8, osx 10.6's Viewer and Ubuntu's document viewer. However, tools such as pdftk (1.41) and pdfinfo (0.12.1) report the structure is corrupted. This is annoying since it means that no further manipulation of the pdf using pdftk will be possible, since the tool refuses to work on the file as there are errors in it. I looked into the files using a binary editor and I found out that if I add more than two bytes after "/Link", the file gets corrupted. This confuses me a lot, since based on the pdf specifications (I'm using 1.4) there is no checksum except for streams, which should mean that one can add as much bytes as he wants, as long as he's not doing that inside a stream and the inserted bytes are valid pdf syntax. What am I missing here?

Here is an example:
the original pdf
the processed pdf

  • 写回答

1条回答 默认 最新

  • douzhi1813 2011-09-27 08:35
    关注

    Adding the additional "/Border" element in the file actually corrupts the pdf's xref table. The xref table references all the objects by their position, measured in bytes from the beginning of the file. Inserting the additional element of course shifts the position (offset) of the subsequent contents by a few bytes.
    To fix the xref table after the edit, I can use pdftk from pdf labs (http://www.pdftk.com) to fix the xref table:

    $ pdftk corrupted_file.pdf output fixed_file.pdf

    As a matter of fact, I was not able to find a comprehensive Pdf solution for php, and I had to use several different kinds of tools (zend_pdf, pdftk, sed) to implement my workflow.

    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论

报告相同问题?

悬赏问题

  • ¥20 有关区间dp的问题求解
  • ¥15 多电路系统共用电源的串扰问题
  • ¥15 slam rangenet++配置
  • ¥15 有没有研究水声通信方面的帮我改俩matlab代码
  • ¥15 对于相关问题的求解与代码
  • ¥15 ubuntu子系统密码忘记
  • ¥15 信号傅里叶变换在matlab上遇到的小问题请求帮助
  • ¥15 保护模式-系统加载-段寄存器
  • ¥15 电脑桌面设定一个区域禁止鼠标操作
  • ¥15 求NPF226060磁芯的详细资料