douqin2108 2019-04-11 17:10
浏览 44
已采纳

使用PHP编辑ADA的PDF格式的元数据

I have several PDFs I need to add the Primary Language (Which for us is always english, so (en-us) as the document's catalog dictionary entry) and Title fields to so I can have these PDFs pass ADA checks.

I've had some luck on PDF version 1.4 with doing string replacements on the whole document (Via file_get_contents) and rewriting the file so i wouldn't lose whats in it, but in 1.5 and 1.6 the PDF standard, insides are even space and tab sensitive it seems.

I've attempt to use exiftool via shell_exec(), but this only seems to work on PDF version 1.4, everything else will set inside the PDF but still fail our scans because of flags like /Type/Catalog/ViewerPreferences<</DisplayDocTitle true>> which seem to be set randomly inside the document on 1.6.

Has anyone tried to tackle this before web side? I was hoping to build something that would solve some troubles to cut down on having to open everyone single one of these in Adobe and resave them.

I've attempted to search for an Adobe API or library i could plug in to do these minor edits. All the frameworks i've seen create new PDFs, which means all the tagging and alt text we put in would be lost so i surely don't want to go the route of Zend or anything that won't JUST edit the Meta Data.

<?php

 $dir = getcwd();   
 $files = scandir($dir);

 foreach($files as $file)
 {
    if(strpos($file, '.pdf') !== false)
    {
        $pdf = file_get_contents($dir.'/'.$file);
                // This seems to work for 1.4, but not anything else
        if(strpos($pdf,'/Lang') === false)
        {
            echo "Changing Lang on " .$file.PHP_EOL;
            $pdf_str = preg_replace("/\/Type \/Catalog/", "/Type /Catalog
/Lang (en-us)", $pdf);
            file_put_contents($dir.'/'.$file, $pdf_str);
        }else{
            echo "Lang passed on ".$file.PHP_EOL;
        }
    }
 }


?>
  • 写回答

1条回答 默认 最新

  • drxdai15012937753 2019-04-15 08:03
    关注

    You should never replace strings in PDF files because you will destroy the whole structure of the file and a reader application needs to repair it at opening time.

    We offer commercial tools for editing PDFs in PHP. Your task can be done with the SetaPDF-Core component:

    require_once('library/SetaPDF/Autoload.php');
    
    $writer = new SetaPDF_Core_Writer_File('result.pdf');
    $document = SetaPDF_Core_Document::loadByFilename('example.odf', $writer);
    
    $catalog = $document->getCatalog();
    $dict = $catalog->getDictionary();
    $dict['Lang'] = new SetaPDF_Core_Type_String('en-us');
    
    $document->save()->finish();
    
    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论

报告相同问题?

悬赏问题

  • ¥20 西门子S7-Graph,S7-300,梯形图
  • ¥50 用易语言http 访问不了网页
  • ¥50 safari浏览器fetch提交数据后数据丢失问题
  • ¥15 matlab不知道怎么改,求解答!!
  • ¥15 永磁直线电机的电流环pi调不出来
  • ¥15 用stata实现聚类的代码
  • ¥15 请问paddlehub能支持移动端开发吗?在Android studio上该如何部署?
  • ¥20 docker里部署springboot项目,访问不到扬声器
  • ¥15 netty整合springboot之后自动重连失效
  • ¥15 悬赏!微信开发者工具报错,求帮改