Preprocess the XML to drop the data-*
attributes before giving it to the validation function. There is otherwise no way I know to validate it with RelaxNG or other grammar-based schema languages.
As far as preprocessing the XML, one way to do that with an existing XML toolchain would be: run it through an XSLT transformation that drops the data-*
attributes but passes on all else as-is:
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version='1.0'>
<xsl:output method="xml" indent="no"/>
<xsl:template match="node() | @*">
<xsl:copy>
<xsl:apply-templates select="@* | node()"/>
</xsl:copy>
</xsl:template>
<xsl:template match="@*[starts-with(name(), 'data-')]"/>
</xsl:stylesheet>
The <xsl:template match="@*[starts-with(name(), 'data-')]"/>
is the important part there. That causes any data-*
attribute to just be dropped on the floor. The rest of that XSL stylesheet is just a basic “identify transform” that passes on everything else from the source XML as-is.
The W3C Nu Html Checker (HTML5 validator) backend does something for data-*
attributes that’s functionally the same as that XSLT transformation, but written in Java. If you’re curious, the code for it is within the GitHub repo for the W3C Nu Html Checker sources, here:
https://github.com/validator/validator/tree/master/src/nu/validator/xml/dataattributes
See the filterAttributes
code in DataAttributeDroppingContentHandlerWrapper.java
It’s essentially a SAX filter that works at parse time off parse events prior to the validation function.
And if you’re even more curious, there is code for other preprocessing filters doing similar things:
Anyway, you get the general idea: If there are any cases of markup constructs in your source that you can’t express validation logic for in RelaxNG or XSD, then you essentially filter (preprocess) the source to hide that markup from the validation function.