将HTML表格转换为文本

I'm working on a project that requires to convert html email into text. Below is a simplified version of the HTML code:

<table>
    <tr>
        <td width="10%"></td>
        <td width="60%"> test product </td>
        <td width="20%">5</td>
        <td width="10%"> £50.00 </td>
    </tr>
    <tr>
        <td></td>
        <td colspan="3" width="100%"> Project Name: Test Project </td>
    </tr>
    <tr>
        <td width="10%"> </td>
        <td colspan="2" width="80%"> Page 1 : 01 New York 1.jpg </td>
        <td width="10%"> £0.00 </td>
    </tr>
</table>

The expected outcome should look like this in a text file (with columns aligned nicely):

test product                                      5            £50.00
Project Name: Test Project                                                            
Page 1 :  01 New York 1.jpg                                    £0.00

My idea is parsing the HTML content by DOMDocument. Then I will set a default width for the table (i.e.: 100 spaces) then convert the width of each column from % to number of spaces (based on colspan & width attribute of <td> tag). Then I will subtract these column width to strlen of the data in each column to archive the number of spaces I need to pad_right to the string to make everything align vertically.

I have been working that way, hasn't been archived what I want but just wondering if it is stupid or anyone knows a better way please help me out.

Also when it comes to Multibyte languages (Japanese, Korean etc...) I don't think my approach would work because their characters will be bigger than one space and it end up a mess.

Can someone help me out please?

写回答
好问题 0 提建议
追加酬金
关注问题
分享
邀请回答
编辑收藏删除结题
收藏举报

1条回答默认最新

关注

码龄粉丝数原力等级 --

被采纳

被点赞

采纳率
douxian4888 2012-06-19 15:02
关注
Don't reinvent the wheel. Table rendering is difficult, rendering tables using only text is even more difficult. To clarify the complexity of a text-based table renderer that offers all the features of HTML, take a look at w3m, which is open source: these 3000 lines of code are there only to display html tables.

Transform HTML to Text

There are textbased browsers that can be used by command line, like lynx. You could fwrite your html table into a file, pass that file into the textbased browser and take its output.

Note: textbased browsers are generally used in a shell, which generally displays in monospace. This remains a prerequisite.

lynx and w3m are both available on Windows and you don't need to "install" them, you just need to have the executables and the permission to run them from PHP.

code example:

<?php $table = '<table><tr><td>foo</td><td>bar</td></tr></table>'; //this contains your table $html = "<html><body>$table</body></html>"; //write html file $tmpfname = tempnam(sys_get_temp_dir(), "tblemail"); $handle = fopen($tmpfname, "w"); fwrite($handle, $html); fclose($handle); $myTextTable = shell_exec("w3m.exe -dump \"$tmpfname\""); unlink($tmpfname);

w3m.exe needs to be in your working directory.

(didn't try it)

Render a Text table

If you want a native PHP solution, there's also at least one framework (https://github.com/c9s/CLIFramework) aimed at console applications for PHP which has a table renderer.

It doesn't transform HTML to text, but it helps you build a text formatted table with support for multiline cells (which seems to be the most complicated part).

Using CLIFramework you would need a code like this to render your table:

<?php require 'vendor/autoload.php'; use CLIFramework\Component\Table\Table; $table = new Table; $table->addRow(array( "test product", "5", "£50.00" )); $table->addRow(array( "Project Name: Test Project", "", "" )); $table->addRow(array( "Page 1 : 01 New York 1.jpg", "", "£0.00" )); $myTextTable = $table->render();

The CLIFramework table renderer doesn't seem to support anything similar to "colspan" however.

Here's the documentation for the table component: https://github.com/c9s/CLIFramework/wiki/Using-Table-Component
本回答被题主选为最佳回答 , 对您是否有帮助呢?

解决无用
评论打赏
分享
举报

评论

按下Enter换行，Ctrl+Enter发表内容

报告相同问题？

关注问题

将HTML表格转换为文本 html php
2012-06-19 14:38

回答 1 已采纳 Don't reinvent the wheel. Table rendering is difficult, rendering tables using only text is even m
利用python将表格转换为三元组 python
2022-02-07 22:37

回答 2 已采纳 import pandas as pd def fun(series, cols): lst = series.tolist() lstr = [] for row in z
将html表格内容导出为xlsx格式 html php
2019-08-19 09:53

回答 1 已采纳 You can do this with a php library. I prefer PhpSpreadsheet to use for working with excel.
php表格文本,php – 将HTML表格转换为文本
2021-04-18 13:50

顽猴溜溜的博客不要重新发明轮子....将HTML转换为文本有一些基于文本的浏览器可以被命令行使用,比如lynx.您可以将html表转换为文件,将该文件传递到基于文本的浏览器并获取其输出.注意：基于文本的浏览器通常用在shel...
html如何将表格合并以及隐藏input框？ css html 前端
2022-08-10 22:36

回答 4 已采纳将表格线border设成0表格就不显示了input有个hidden属性，看一下
表格前端导出excel问题 html javascript 前端
2023-04-07 14:44

回答 1 已采纳 wb['!merges'] 是用于获取 Excel 表格中合并单元格的属性的方式。在使用 XLSX.utils.table_to_sheet 函数将 HTML 表格转换成 Sheet 对象时，合并单元
HTML如何将文字写在表格线上 css html5
2022-04-05 21:25

回答 3 已采纳看这个： HTML 5 标签 https://www.w3school
HTML table表格转换为Markdown table表格
2023-06-03 11:03

衣舞晨风的博客第一步：复制包含HTML table标签的代码第二步：打开https://tableconvert.com/，点击import菜单第三步：选择HTML选项卡并粘贴内容到文本框，点击import data
为什么HTML无法打印表格 html5 php
2021-01-05 15:24

回答 3 已采纳 <!DOCTYPE html> <html xmlns="http://www.w3.org/1999/xhtml"> <head> <titl
html如何设置里面的表格线为虚线呀？ html
2021-11-08 14:30

回答 1 已采纳 border:1px dashed #000000;
有一个excel导出富文本带标签的需求 html javascript 前端
2021-12-23 08:54

回答 1 已采纳 html标签只能被浏览器识别，excel里有，有什么特殊要求吗？？？？？？导入想要html，你应该处理原数据，
前端vue 表格转换为图片导出
2022-05-17 10:08

凹凸曼与程序猿的博客 npm 安装npm install --save html2canvas ... 表格 </div> <a-icon type="copy" @click="createImg()" /> import html2canvas from "html2canvas"; createImg() { html2canv.
前端根据数据动态添加表格并计算数量 javascript jquery 前端
2022-09-14 17:02

回答 1 已采纳 var obj = {"result":"ok","list":[{"msg":"","VBELN":"","MAKTX":"MMD-AP0186MHY-C"},{"msg":"","VBELN":
将富文本转换为普通文本
2022-09-21 11:39

今天超市大减价的博客将富文本转换为普通文本。
前端将数据转换为excel表格/带格式excel表格
2023-07-27 15:20

atzw1988love的博客 }, * left: {}, * right: {}, * bottom: {} * }, * alignment: { // 文本对齐方式 * vertical: 'center', // 垂直 * horizontal: 'center', // 水平 * wrapText: false, // 是否自动换行 * textRotation: 0, // 字体...
没有解决我的问题, 去提问

悬赏问题

¥15 安卓QQ协议判断协议软
¥15 Office2016如何使用公司域账户登录？
¥15 windows server 2016 共享文件夹权限问题
¥15 单片机如何进行数据存储
¥15 把Excel导入MATLAB显示错误怎么解决？
¥15 Java中消息和缓存如何使用
¥50 易语言把MYSQL数据库中的数据添加至组合框
¥20 求数据集和代码#有偿答复
¥15 关于下拉菜单选项关联的问题
¥20 java-OJ-健康体检

将HTML表格转换为文本

1条回答 默认 最新

Transform HTML to Text

Render a Text table

悬赏问题

1条回答默认最新