2018-12-13 13:46
采纳率: 100%
浏览 1.3k


我想使用LibreOffice 10(Build:2)从Ubuntu 18终端将PDF转换为Microsoft Word(doc,docx)(实际上我使用PHP执行LibreOffice)。 但是我装满了文本框文档,而不是普通的Word文档。 首先了解我的问题,我建议在这里下载我的文件: https://nofile.io/f/DKvQYFRdYZg/pdf2word.rar



首先我转换 original.pdforiginal-to-pdf.pdf然后我尝试转换回Word使用以下命令:

soffice --infilter="writer_pdf_import" --convert-to docx a.pdf




enter image description here

ilovepdf 输出:

enter image description here


soffice --infilter="writer_pdf_import" --convert-to docx a.pdf
soffice --infilter="writer_pdf_import" --convert-to docx:"Microsoft Word 2007/2010/2013 XML" a.pdf
soffice --infilter="writer_pdf_import" --convert-to doc:"MS 2003 XML" a.pdf
soffice --infilter="writer_pdf_import" --convert-to doc a.pdf
soffice --infilter="writer_pdf_import" --convert-to odf:"writer8" a.pdf
soffice --infilter="writer8" --convert-to doc a.odf
soffice --infilter="writer_pdf_import" --convert-to doc:"MS Word 95" a.pdf
soffice --infilter="writer_pdf_import" --convert-to doc:"MS Word 97" a.pdf
soffice --infilter="writer_pdf_import" --convert-to doc:"StarOffice XML (Writer)" a.pdf
soffice --infilter="writer_pdf_import" --convert-to doc:"MS Word 2003 XML" a.pdf
soffice --infilter="writer_pdf_import" --convert-to docx:"MS Word 2003 XML" a.pdf
soffice --infilter="writer_pdf_import" --convert-to doc:"MS Word 2007 XML" a.pdf
soffice --infilter="writer_pdf_import" --convert-to doc:"MS Word 2003 XML" a.pdf
soffice --infilter="writer_pdf_import" --convert-to docx:"MS Word 2007 XML Template" a.pdf
soffice --infilter="writer_pdf_import" --convert-to docx:"MS Word 2007 XML" a.pdf
soffice --infilter="Microsoft Word 2007/2010/2013 XML" --convert-to doc a.pdf

我知道一些高级软件 abbyy cloud 或者 adobe cloud, 但我不认为像ilovepdf这样的网站会使用付费服务来提供免费服务。我的问题是,我是否遗漏了LibreOffice依赖中的一些东西,以便能够将PDF转换为正常的word文档?

  • 点赞
  • 写回答
  • 关注问题
  • 收藏
  • 邀请回答

1条回答 默认 最新

  • douying6206 2018-12-15 03:42

    Your problem lies with the software used to create the PDF; output in the form of textboxes in a PDF is a characteristic of certain low-end PDF-creation software. There is nothing Word can do about that during the import process; you would need to clean it up afterwards.

    A Word macro you could use for the clean-up is:

    Sub EraseTextBoxes()
    Dim RngDoc As Range, RngShp As Range, i As Long
    With ActiveDocument
      For i = .Shapes.Count To 1 Step -1
        With .Shapes(i)
          If .Type = msoTextBox Then
            Set RngShp = .TextFrame.TextRange
            RngShp.End = RngShp.End - 1
            Set RngDoc = .Anchor
            RngDoc.Collapse wdCollapseEnd
            RngDoc.FormattedText = RngShp.FormattedText
          End If
        End With
    End With
    End Sub

    Do note that whether the macro positions the output correctly depends on where the textboxes are anchored; if the anchor positions are unrelated to the textbox locations, you'll end up with a dog's breakfast. You'll probably still also end up with each line as its own paragraph. To clean up such content, see http://www.msofficeforums.com/word/29880-cleaning-up-text-pasted-websites-e-mails.html

    点赞 打赏 评论

相关推荐 更多相似问题