LibreOffice将PDF转换为Word作为文本框而不是普通文档

我想使用LibreOffice 6.1.3.2 10（Build：2）从Ubuntu 18终端将PDF转换为Microsoft Word（doc，docx）（实际上我使用PHP执行LibreOffice）。但是我装满了文本框文档，而不是普通的Word文档。首先了解我的问题，我建议在这里下载我的文件: https://nofile.io/f/DKvQYFRdYZg/pdf2word.rar

我有4个文件:

1.original.doc
2.original-to-pdf.pdf
3.pdf-to-word.doc
4.expected.doc

首先我转换 original.pdf 到original-to-pdf.pdf然后我尝试转换回Word使用以下命令:

soffice --infilter="writer_pdf_import" --convert-to docx a.pdf

文件创建成功，但所有内容转换为文本框不作为正常的文件。然后我尝试了几个PDF到Word的转换器，如ilovepdf.com和我得到的expected.doc

你可以通过上方的链接下载我的文件来查看不同的内容，也可以查看下面的图片

自定义查询结果:

ilovepdf 输出:

我尝试了几个过滤器，包括pdf到odt，然后odt到word，但所有命令下面没有给我预期的结果

soffice --infilter="writer_pdf_import" --convert-to docx a.pdf
soffice --infilter="writer_pdf_import" --convert-to docx:"Microsoft Word 2007/2010/2013 XML" a.pdf
soffice --infilter="writer_pdf_import" --convert-to doc:"MS 2003 XML" a.pdf
soffice --infilter="writer_pdf_import" --convert-to doc a.pdf
soffice --infilter="writer_pdf_import" --convert-to odf:"writer8" a.pdf
soffice --infilter="writer8" --convert-to doc a.odf
soffice --infilter="writer_pdf_import" --convert-to doc:"MS Word 95" a.pdf
soffice --infilter="writer_pdf_import" --convert-to doc:"MS Word 97" a.pdf
soffice --infilter="writer_pdf_import" --convert-to doc:"StarOffice XML (Writer)" a.pdf
soffice --infilter="writer_pdf_import" --convert-to doc:"MS Word 2003 XML" a.pdf
soffice --infilter="writer_pdf_import" --convert-to docx:"MS Word 2003 XML" a.pdf
soffice --infilter="writer_pdf_import" --convert-to doc:"MS Word 2007 XML" a.pdf
soffice --infilter="writer_pdf_import" --convert-to doc:"MS Word 2003 XML" a.pdf
soffice --infilter="writer_pdf_import" --convert-to docx:"MS Word 2007 XML Template" a.pdf
soffice --infilter="writer_pdf_import" --convert-to docx:"MS Word 2007 XML" a.pdf
soffice --infilter="Microsoft Word 2007/2010/2013 XML" --convert-to doc a.pdf

我知道一些高级软件 abbyy cloud 或者 adobe cloud, 但我不认为像ilovepdf这样的网站会使用付费服务来提供免费服务。我的问题是，我是否遗漏了LibreOffice依赖中的一些东西，以便能够将PDF转换为正常的word文档?

写回答
好问题 0 提建议
追加酬金
关注问题
分享
邀请回答
编辑收藏删除结题
收藏举报

1条回答默认最新

关注

码龄粉丝数原力等级 --

被采纳

被点赞

采纳率
douying6206 2018-12-15 03:42
关注
Your problem lies with the software used to create the PDF; output in the form of textboxes in a PDF is a characteristic of certain low-end PDF-creation software. There is nothing Word can do about that during the import process; you would need to clean it up afterwards.

A Word macro you could use for the clean-up is:

Sub EraseTextBoxes() Dim RngDoc As Range, RngShp As Range, i As Long With ActiveDocument For i = .Shapes.Count To 1 Step -1 With .Shapes(i) If .Type = msoTextBox Then Set RngShp = .TextFrame.TextRange RngShp.End = RngShp.End - 1 Set RngDoc = .Anchor RngDoc.Collapse wdCollapseEnd RngDoc.FormattedText = RngShp.FormattedText .Delete End If End With Next End With End Sub

Do note that whether the macro positions the output correctly depends on where the textboxes are anchored; if the anchor positions are unrelated to the textbox locations, you'll end up with a dog's breakfast. You'll probably still also end up with each line as its own paragraph. To clean up such content, see http://www.msofficeforums.com/word/29880-cleaning-up-text-pasted-websites-e-mails.html
本回答被题主选为最佳回答 , 对您是否有帮助呢?

解决无用
评论打赏
分享
举报

评论

按下Enter换行，Ctrl+Enter发表内容

报告相同问题？

关注问题

LibreOffice将PDF转换为Word作为文本框而不是普通文档 php
2018-12-13 13:46

回答 1 已采纳 Your problem lies with the software used to create the PDF; output in the form of textboxes in a P
PHP+LibreOffice+Centos实现Word转PDF页面样式设置 centos php
2022-04-21 10:06

回答 1 已采纳 LibreOffice 没有配置样式代码的？
arm编译libreoffice遇到的问题 java linux ubuntu
2023-02-02 09:23

回答 1 已采纳 “该回答引用ChatGPT”可参考下面的解决方案：看起来是缺失了中文翻译文件，导致编译失败。 1、从 LibreOffice 的官方站点下载相应语言的翻译文件并且把它们放到对应的目录。2、使用不需要
docker-libreoffice-pdf-cli:无需安装LibreOffice及其依赖项即可将任何文档转换为pdf
2021-04-30 04:26

无需安装LibreOffice及其依赖项即可将任何文档转换为pdf 特征不会因LibreOffice依赖项而使您的系统混乱！ UNIX方式- stdin并从stdout获取因此无需安装Docker卷 --rm arg确保在转换后删除容器。不会使您的Docker...
如何使用正则表达式添加空格和标点符号来捕获第一组？如何在LibreOffice中停止分成两列的某些标签？ php
2017-12-23 20:53

回答 1 已采纳 This one was a bit hairy, but after all, just a small adjustment was needed: ^ (?<frequency&gt
使wps-docx文档中特定字母加粗标红的脚本 bash linux python 有问必答
2022-06-04 21:18

回答 3 已采纳使用python-docx库操作试试。示例代码： from docx import Document from docx.shared import RGBColor s= 'ATCGGATCMDNA
software安装软件libreoffice出现cannot perform the following tasks怎么解决？ linux ubuntu
2019-06-28 20:44

回答 1 已采纳然而下面的错误信息看不到但是一般来说，是网络的问题，ubuntu英文版，默认的软件源在国外，检查是不是网络问题要么就是权限问题
LibreOffice C# SDK 文档浏览PDF转换支持嵌入WinForm Office转换PDF 支持wps转换
2022-01-17 09:38

网上搜到之前的嵌入Winform代码已经不可用了，自己搜索也没有找到，参考官方API文档自己弄了一个，基于LibreOffice 7.2.5，兼容性不错，绝大部分Office文档都可以浏览且可以转换成PDF，自由嵌入自己的WinForm窗体，...
PHP：使用utf8_encode时在csv中错误编码的字符 mysql php
2016-06-23 14:31

回答 2 已采纳 The var_dump shows that the string is already encoded in UTF-8. Using utf8_encode on it will garbl
PHP str_getcsv不会分隔索引1和2中的元素 php
2016-08-22 14:57

回答 1 已采纳 So here's your problem: $db = array_map('str_getcsv', file($dbLocation), $paramStrGetCsv); Firs
PHP和XML之间有什么关系？ php xml
2012-08-28 00:03

回答 3 已采纳 XML is not a grammar (that's another thing entirely). XML (as the name suggests) is a markup langu
centos安装LibreOffice实现word转换PDF操作
2022-04-28 16:56

centos安装LibreOffice实现word转换PDF操作安装LibreOffice_5.4.6.2_Linux_x86-64_rpm word转PDF导出
在WIN7 64位环境下，LibreOffice_6.1.5_Win_x64.msi 安装后，提示安装不成功，自动回滚是怎么回事？ java java-ee
2019-04-12 13:06

回答 1 已采纳资源管理器选择 LibreOffice_6.1.5_Win_x64.msi ，右键。属性，看下数字签名，是否有效，如果无效，重新下载。如果正确，检查系统是否有杀毒软件，如果有，关闭。检查你的win
excel-to-pdf:使用LibreOffice将Excel自动转换为PDF
2021-04-30 15:34

使用LibreOffice将Excel自动转换为PDF 读取现有的Excel文件。修改现有Excel文件的单元格值。将工作表的输出转换为PDF安装Node.js 运行curl -sL https://deb.nodesource.com/setup_14.x | sudo bash - curl -sL ...
libreoffice实现word转pdf
2018-04-03 10:34

使用Libreoffice 完美实现在线word转pdf.支持word和linux两个系统
serverless-libreoffice：在AWS Lambda中运行LibreOffice以创建PDF并转换文档
2021-02-03 13:57

无服务器LibreOffice 给我看代码此回购包含用于运行代码。 ├── compile.sh <-- commands used to compile LibreOffice for Lambda├── infra <-- terraform config to deploy example Lambda│ ├── ...
SpringBoot使用LibreOffice word转换PDF
2022-03-14 16:42

Meissu的博客由于java转pdf Aspose需要收费，documents4j是使用本地的MS Office应用做的文件格式转换,...思路：先用freemarker模板工具，生成docx文档，借助libreOffice将docx转pdf。 1、生成docx模板和xml模板 ......
使用libreOffice将word转pdf太慢或者无响应
2023-07-28 16:27

LiZhuBinCSDN的博客 word转pdf的时候，文件也不大，但是有些文档会特别慢，甚至直接无响应并且不返回结果。永远定在这个页面是什么情况，各位大佬有木有解决的办法。
pdftoword:基于LibreOffice命令行工具的pdf到word转换GUI应用
2021-05-17 15:49

它支持将任何pdf文件转换为可编辑的word文件。它使用LibreOffice命令行工具作为GUI。要试用，请下载： : 截屏安装和建造安装要在Linux中安装，您可以在使用最终的可移植AppImage版本。建造要构建该应用程序...
LibreOffice 将word,excel,PowerPoint文件转换PDF
2024-03-28 16:22

swoole~的博客执行上述命令后，你的Word、Excel和PowerPoint...要将Word和Excel文件转换为PDF，你可以使用LibreOffice的。选项用于在没有图形界面的情况下运行LibreOffice，用于指定要转换为PDF的格式，用于指定输出文件的存放路径。
没有解决我的问题, 去提问

悬赏问题

¥20 ML307A在使用AT命令连接EMQX平台的MQTT时被拒绝
¥20 腾讯企业邮箱邮件可以恢复么
¥15 有人知道怎么将自己的迁移策略布到edgecloudsim上使用吗？
¥15 错误 LNK2001 无法解析的外部符号
¥50 安装pyaudiokits失败
¥15 计组这些题应该咋做呀
¥60 更换迈创SOL6M4AE卡的时候，驱动要重新装才能使用，怎么解决？
¥15 让node服务器有自动加载文件的功能
¥15 jmeter脚本回放有的是对的有的是错的
¥15 r语言蛋白组学相关问题

LibreOffice将PDF转换为Word作为文本框而不是普通文档

1条回答 默认 最新

悬赏问题

1条回答默认最新