I'm working on a project that requires to convert html email into text. Below is a simplified version of the HTML code:
<table> <tr> <td width="10%"></td> <td width="60%"> test product </td> <td width="20%">5</td> <td width="10%"> £50.00 </td> </tr> <tr> <td></td> <td colspan="3" width="100%"> Project Name: Test Project </td> </tr> <tr> <td width="10%"> </td> <td colspan="2" width="80%"> Page 1 : 01 New York 1.jpg </td> <td width="10%"> £0.00 </td> </tr> </table>
The expected outcome should look like this in a text file (with columns aligned nicely):
test product 5 £50.00 Project Name: Test Project Page 1 : 01 New York 1.jpg £0.00
My idea is parsing the HTML content by DOMDocument. Then I will set a default width for the table (i.e.: 100 spaces) then convert the width of each column from % to number of spaces (based on
width attribute of
<td> tag). Then I will subtract these column width to
strlen of the data in each column to archive the number of spaces I need to pad_right to the string to make everything align vertically.
I have been working that way, hasn't been archived what I want but just wondering if it is stupid or anyone knows a better way please help me out.
Also when it comes to Multibyte languages (Japanese, Korean etc...) I don't think my approach would work because their characters will be bigger than one space and it end up a mess.
Can someone help me out please?