I found why.
The DOM extension was built on libxml2 whose HTML parser was made for HTML 4. If an HTML5 doctype and a meta element like so
<meta charset="utf-8"> HTML code will get interpreted as ISO-8859-something and non-ASCII chars will get converted into HTML entities.
However the HTML4-like version will work
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
Reference: UTF-8 with PHP DOMDocument loadHTML?