duanfu1942 2018-12-13 13:46 采纳率: 100%
浏览 2018
已采纳

LibreOffice将PDF转换为Word作为文本框而不是普通文档

我想使用LibreOffice 6.1.3.2 10(Build:2)从Ubuntu 18终端将PDF转换为Microsoft Word(doc,docx)(实际上我使用PHP执行LibreOffice)。 但是我装满了文本框文档,而不是普通的Word文档。 首先了解我的问题,我建议在这里下载我的文件: https://nofile.io/f/DKvQYFRdYZg/pdf2word.rar

我有4个文件:

1.original.doc
2.original-to-pdf.pdf
3.pdf-to-word.doc
4.expected.doc

首先我转换 original.pdforiginal-to-pdf.pdf然后我尝试转换回Word使用以下命令:

soffice --infilter="writer_pdf_import" --convert-to docx a.pdf

文件创建成功,但所有内容转换为文本框不作为正常的文件。然后我尝试了几个PDF到Word的转换器,如ilovepdf.com和我得到的expected.doc

你可以通过上方的链接下载我的文件来查看不同的内容,也可以查看下面的图片

自定义查询结果:

enter image description here

ilovepdf 输出:

enter image description here

我尝试了几个过滤器,包括pdf到odt,然后odt到word,但所有命令下面没有给我预期的结果

soffice --infilter="writer_pdf_import" --convert-to docx a.pdf
soffice --infilter="writer_pdf_import" --convert-to docx:"Microsoft Word 2007/2010/2013 XML" a.pdf
soffice --infilter="writer_pdf_import" --convert-to doc:"MS 2003 XML" a.pdf
soffice --infilter="writer_pdf_import" --convert-to doc a.pdf
soffice --infilter="writer_pdf_import" --convert-to odf:"writer8" a.pdf
soffice --infilter="writer8" --convert-to doc a.odf
soffice --infilter="writer_pdf_import" --convert-to doc:"MS Word 95" a.pdf
soffice --infilter="writer_pdf_import" --convert-to doc:"MS Word 97" a.pdf
soffice --infilter="writer_pdf_import" --convert-to doc:"StarOffice XML (Writer)" a.pdf
soffice --infilter="writer_pdf_import" --convert-to doc:"MS Word 2003 XML" a.pdf
soffice --infilter="writer_pdf_import" --convert-to docx:"MS Word 2003 XML" a.pdf
soffice --infilter="writer_pdf_import" --convert-to doc:"MS Word 2007 XML" a.pdf
soffice --infilter="writer_pdf_import" --convert-to doc:"MS Word 2003 XML" a.pdf
soffice --infilter="writer_pdf_import" --convert-to docx:"MS Word 2007 XML Template" a.pdf
soffice --infilter="writer_pdf_import" --convert-to docx:"MS Word 2007 XML" a.pdf
soffice --infilter="Microsoft Word 2007/2010/2013 XML" --convert-to doc a.pdf

我知道一些高级软件 abbyy cloud 或者 adobe cloud, 但我不认为像ilovepdf这样的网站会使用付费服务来提供免费服务。我的问题是,我是否遗漏了LibreOffice依赖中的一些东西,以便能够将PDF转换为正常的word文档?

  • 写回答

1条回答 默认 最新

  • douying6206 2018-12-15 03:42
    关注

    Your problem lies with the software used to create the PDF; output in the form of textboxes in a PDF is a characteristic of certain low-end PDF-creation software. There is nothing Word can do about that during the import process; you would need to clean it up afterwards.

    A Word macro you could use for the clean-up is:

    Sub EraseTextBoxes()
    Dim RngDoc As Range, RngShp As Range, i As Long
    With ActiveDocument
      For i = .Shapes.Count To 1 Step -1
        With .Shapes(i)
          If .Type = msoTextBox Then
            Set RngShp = .TextFrame.TextRange
            RngShp.End = RngShp.End - 1
            Set RngDoc = .Anchor
            RngDoc.Collapse wdCollapseEnd
            RngDoc.FormattedText = RngShp.FormattedText
            .Delete
          End If
        End With
      Next
    End With
    End Sub
    

    Do note that whether the macro positions the output correctly depends on where the textboxes are anchored; if the anchor positions are unrelated to the textbox locations, you'll end up with a dog's breakfast. You'll probably still also end up with each line as its own paragraph. To clean up such content, see http://www.msofficeforums.com/word/29880-cleaning-up-text-pasted-websites-e-mails.html

    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论

报告相同问题?

悬赏问题

  • ¥15 wpf通过绑定控件自身的值,来实现背景颜色的切换
  • ¥15 CDH6.3 运行hive -e hive -e "show databases;"报错:hive-env.sh:行24: hbase-common.jar: 权限不够
  • ¥15 乌班图ip地址配置及远程SSH
  • ¥15 怎么让点阵屏显示静态爱心,用keiluVision5写出让点阵屏显示静态爱心的代码,越快越好
  • ¥15 PSPICE制作一个加法器
  • ¥15 javaweb项目无法正常跳转
  • ¥15 VMBox虚拟机无法访问
  • ¥15 skd显示找不到头文件
  • ¥15 机器视觉中图片中长度与真实长度的关系
  • ¥15 fastreport table 怎么只让每页的最下面和最顶部有横线