I'm working with text/HTML diffing engine that's using XML in it's core but we're inputting HTML5 data, I wonder how to take care of tags that don't need to be closed in HTML5, but must be closed in XML. For Example:
<img alt="" height="239" src="http://example.com/image.png" width="272">
Do I need to convert every tag manually (Just like this example)?
Is there a tool that would do this for me? And save a headache escaping all self-closing HTML tags?
For example xml_parse()
runs following code like it has an error, but body
has a valid HTML which is invalid XML:
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE html [<!ENTITY Aacute "Á">]>
<body>
<div>
<figure class="table ">
<figcaption>
<p class="table_number"></p>
<p class="table_title" epub:type="title"></p>
</figcaption>
<table class="code ">
<tr>
<td width="50">
<img alt="" height="239" src="http://example.com/image.png" width="272">
</td>
</tr>
</table>
</figure>
</div>
</body>