doushan1850 2011-02-22 21:09
浏览 815

如何在PHP中合并docx文档?

Does anyone know how to merge (concatenate) docx documents with PHP (or Python if it's not possible in PHP)?

To clarify, my server is Linux based. I have 2 existing docx document, I need to put them in a new docx document using PHP or possibly Python.

  • 写回答

2条回答 默认 最新

  • dqf60304 2011-10-31 21:49
    关注

    Merging two different Docx files may be very complicated because Headers, Styles, Charts, Comments, User Modification Traces and other special contents are saved in separate inner XML sub-files in each Docx. Thus, two Docx may have different objects having the same ids. So it would be a very huge job to list all possible objects in the two documents, give them new inner ids, and re-affect them in a single one. Probably only Ms Office can do this currently.

    Nevertheless, if you know that your two documents to be merged have the same styles, and if you know you have no charts, headers and other special objects, then the merging becomes something quite easy to perform.

    In this case, you only have to use a Zip reader, such as TbsZip, to open the first Docx file (which is technically a zip archive containing XML sub-files) ; then read the sub-file "word/document.xml" and extract the part which is between the tags < w:body > and < /w:body >. In the second Docx file, open the "word/content.xml" and insert the previous content just before the tag < /w:body >. Save the result in a new Docx file.

    This can be done using TbsZip, like this :

    <?php
    
    include_once('tbszip.php');
    
    $zip = new clsTbsZip();
    
    // Open the first document
    $zip->Open('doc1.docx');
    $content1 = $zip->FileRead('word/document.xml');
    $zip->Close();
    
    // Extract the content of the first document
    $p = strpos($content1, '<w:body');
    if ($p===false) exit("Tag <w:body> not found in document 1.");
    $p = strpos($content1, '>', $p);
    $content1 = substr($content1, $p+1);
    $p = strpos($content1, '</w:body>');
    if ($p===false) exit("Tag </w:body> not found in document 1.");
    $content1 = substr($content1, 0, $p);
    
    // Insert into the second document
    $zip->Open('doc2.docx');
    $content2 = $zip->FileRead('word/document.xml');
    $p = strpos($content2, '</w:body>');
    if ($p===false) exit("Tag </w:body> not found in document 2.");
    $content2 = substr_replace($content2, $content1, $p, 0);
    $zip->FileReplace('word/document.xml', $content2, TBSZIP_STRING);
    
    // Save the merge into a third file
    $zip->Flush(TBSZIP_FILE, 'merge.docx');
    
    评论

报告相同问题?

悬赏问题

  • ¥100 有人会搭建GPT-J-6B框架吗?有偿
  • ¥15 求差集那个函数有问题,有无佬可以解决
  • ¥15 【提问】基于Invest的水源涵养
  • ¥20 微信网友居然可以通过vx号找到我绑的手机号
  • ¥15 寻一个支付宝扫码远程授权登录的软件助手app
  • ¥15 解riccati方程组
  • ¥15 display:none;样式在嵌套结构中的已设置了display样式的元素上不起作用?
  • ¥15 使用rabbitMQ 消息队列作为url源进行多线程爬取时,总有几个url没有处理的问题。
  • ¥15 Ubuntu在安装序列比对软件STAR时出现报错如何解决
  • ¥50 树莓派安卓APK系统签名