I am importing contents from an Excel-generated CSV-file into an XML document like:
$csv = fopen($csvfile, r);
$words = array();
while (($pair = fgetcsv($csv)) !== FALSE) {
array_push($words, array('en' => $pair[0], 'de' => $pair[1]));
}
The inserted data are English/German expressions.
I insert these values into an XML structure and output the XML as following:
$dictionary = new SimpleXMLElement('<dictionary></dictionary>');
//do things
$dom = dom_import_simplexml($dictionary) -> ownerDocument;
$dom -> formatOutput = true;
header('Content-encoding: utf-8'); //<3 UTF-8
header('Content-type: text/xml'); //Headers set to correct mime-type for XML output!!!!
echo $dom -> saveXML();
This is working fine, yet I am encountering one really strange problem. When the first letter of a String is an Umlaut (like in Österreich
or Ägypten
) the character will be omitted, resulting in gypten
or sterreich
. If the Umlaut is in the middle of the String (Russische Föderation
) it gets transferred correctly. Same goes for things like ß
or é
or whatever.
All files are UTF-8 encoded and served in UTF-8.
This seems rather strange and bug-like to me, yet maybe I am missing something, there's a lot of smart people around here.