I wouldn't be too worried about performance here, I would consider them "comparable". Benchmarks would need to be ran to truly determine this, as it would depend on the size of the document and how the regular expression is written.
Instead, I would be concerned about accuracy. In general DOMDocument
will be much better at parsing XML since it was built to read and understand the language. However, it does fail on <includes module='footer'>
because it is an un-closed tag (expecting: </includes>
).
Most common HTML/XML formatting issues can be fixed with PHP's Tidy
class. I would check this out, since you should receive much more "expected results" compared to if you used regex to parse. If you used a regular expression, there could technically be attributes before/after the module
, elements within the includes
element, unexpected characters like <includes module='foo>bar'>
, etc.
In the end, if your XML is in a "controlled" environment (i.e. you know what can and can't happen, you know what possible characters module
will contain, you know that it will always be a self closing element containing now children, etc.) than by all means use a regular expression. Just know it is looking for a very specific set of rules. However, if you expect for this to work with "anything you throw at it"..please use a DOM parser (after Tidy
'ing to avoid the exceptions), regardless of performance (although I bet it will be very comparable in many instances).
Also, final note, if you plan to find/replace/manipulate many nodes in a document, you will see a large performance increase by going with a DOM parser. A DOM parser will take a document and parse it, once. Then you just traverse the data it already has loaded into its class. This is compared to using regular expressions, where each individual one will be ran across the whole document looking for a set of matches.
If you want me to get more specific in any area (i.e. give a Tidy
example, or work on a benchmark), let me know.