I need to parse a huge xml of photo albums. I'm using PHP SimpleXML to parse, however it fails on some entries with errors because extra brackets may appear in some cases, see 'description' or 'CameraModel' tags.

How do I clean up xml before loading it with SimpleXML? If possible, replace extra brackets with '_' underscore.

Here is my xml:

<exif><CameraModel><Digimax S500 / Kenox S500</CameraModel>
<CameraMake>Samsung Techwin</CameraMake>
<DateTime>2008-07-12 17:37:24</DateTime>
  • douqiangchuai7674 2013-09-25 23:01

    Use regex

    print preg_replace("/(<([\w]+)[^>]*>.*)(<)(.*<\/\\2>)/", "$1_$4", $xml);
