i try to parse the Wikipedia XML which i get from the xml wikipedia export
In one case i need to extract all image path. The raw markup looks like,
[[Bild:nameOfImage.png|image description]]
"Bild" can also be "Image", "File" or "Datei"
To extract the text for an Image i use this regex.
'|\[\[.*\|.*\]\]|U'
This works fine, if in the image description isn't an other '[[ .. ]]', like
[[Bild:nameOfImage.png|image Description with a [[new wiki link]] ]]
My question is, how can i modify the Regex to get all text between the first "[[" and the last "]]" without to count all '[' an ']' character.
thanks in advance