duanfu1942 2018-12-13 13:46 采纳率: 100%
浏览 1959
已采纳

LibreOffice将PDF转换为Word作为文本框而不是普通文档

我想使用LibreOffice 6.1.3.2 10(Build:2)从Ubuntu 18终端将PDF转换为Microsoft Word(doc,docx)(实际上我使用PHP执行LibreOffice)。 但是我装满了文本框文档,而不是普通的Word文档。 首先了解我的问题,我建议在这里下载我的文件: https://nofile.io/f/DKvQYFRdYZg/pdf2word.rar

我有4个文件:

1.original.doc
2.original-to-pdf.pdf
3.pdf-to-word.doc
4.expected.doc

首先我转换 original.pdforiginal-to-pdf.pdf然后我尝试转换回Word使用以下命令:

soffice --infilter="writer_pdf_import" --convert-to docx a.pdf

文件创建成功,但所有内容转换为文本框不作为正常的文件。然后我尝试了几个PDF到Word的转换器,如ilovepdf.com和我得到的expected.doc

你可以通过上方的链接下载我的文件来查看不同的内容,也可以查看下面的图片

自定义查询结果:

enter image description here

ilovepdf 输出:

enter image description here

我尝试了几个过滤器,包括pdf到odt,然后odt到word,但所有命令下面没有给我预期的结果

soffice --infilter="writer_pdf_import" --convert-to docx a.pdf
soffice --infilter="writer_pdf_import" --convert-to docx:"Microsoft Word 2007/2010/2013 XML" a.pdf
soffice --infilter="writer_pdf_import" --convert-to doc:"MS 2003 XML" a.pdf
soffice --infilter="writer_pdf_import" --convert-to doc a.pdf
soffice --infilter="writer_pdf_import" --convert-to odf:"writer8" a.pdf
soffice --infilter="writer8" --convert-to doc a.odf
soffice --infilter="writer_pdf_import" --convert-to doc:"MS Word 95" a.pdf
soffice --infilter="writer_pdf_import" --convert-to doc:"MS Word 97" a.pdf
soffice --infilter="writer_pdf_import" --convert-to doc:"StarOffice XML (Writer)" a.pdf
soffice --infilter="writer_pdf_import" --convert-to doc:"MS Word 2003 XML" a.pdf
soffice --infilter="writer_pdf_import" --convert-to docx:"MS Word 2003 XML" a.pdf
soffice --infilter="writer_pdf_import" --convert-to doc:"MS Word 2007 XML" a.pdf
soffice --infilter="writer_pdf_import" --convert-to doc:"MS Word 2003 XML" a.pdf
soffice --infilter="writer_pdf_import" --convert-to docx:"MS Word 2007 XML Template" a.pdf
soffice --infilter="writer_pdf_import" --convert-to docx:"MS Word 2007 XML" a.pdf
soffice --infilter="Microsoft Word 2007/2010/2013 XML" --convert-to doc a.pdf

我知道一些高级软件 abbyy cloud 或者 adobe cloud, 但我不认为像ilovepdf这样的网站会使用付费服务来提供免费服务。我的问题是,我是否遗漏了LibreOffice依赖中的一些东西,以便能够将PDF转换为正常的word文档?

  • 写回答

1条回答 默认 最新

  • douying6206 2018-12-15 03:42
    关注

    Your problem lies with the software used to create the PDF; output in the form of textboxes in a PDF is a characteristic of certain low-end PDF-creation software. There is nothing Word can do about that during the import process; you would need to clean it up afterwards.

    A Word macro you could use for the clean-up is:

    Sub EraseTextBoxes()
    Dim RngDoc As Range, RngShp As Range, i As Long
    With ActiveDocument
      For i = .Shapes.Count To 1 Step -1
        With .Shapes(i)
          If .Type = msoTextBox Then
            Set RngShp = .TextFrame.TextRange
            RngShp.End = RngShp.End - 1
            Set RngDoc = .Anchor
            RngDoc.Collapse wdCollapseEnd
            RngDoc.FormattedText = RngShp.FormattedText
            .Delete
          End If
        End With
      Next
    End With
    End Sub
    

    Do note that whether the macro positions the output correctly depends on where the textboxes are anchored; if the anchor positions are unrelated to the textbox locations, you'll end up with a dog's breakfast. You'll probably still also end up with each line as its own paragraph. To clean up such content, see http://www.msofficeforums.com/word/29880-cleaning-up-text-pasted-websites-e-mails.html

    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论

报告相同问题?

悬赏问题

  • ¥20 ML307A在使用AT命令连接EMQX平台的MQTT时被拒绝
  • ¥20 腾讯企业邮箱邮件可以恢复么
  • ¥15 有人知道怎么将自己的迁移策略布到edgecloudsim上使用吗?
  • ¥15 错误 LNK2001 无法解析的外部符号
  • ¥50 安装pyaudiokits失败
  • ¥15 计组这些题应该咋做呀
  • ¥60 更换迈创SOL6M4AE卡的时候,驱动要重新装才能使用,怎么解决?
  • ¥15 让node服务器有自动加载文件的功能
  • ¥15 jmeter脚本回放有的是对的有的是错的
  • ¥15 r语言蛋白组学相关问题