I'm importing a csv-file with user-data another person sent me. The file was sent back-and-forth between multiple people, working with different programs on different OS. So some encoding issues occurred and I found no easy way to fix it (yes I converted the file to UTF-8 etc).
The data are stored in a postgresql database and I found a way to convert the broken characters.
For example:
â\u0088\u009Aº = ü
â\u0088\u009A§ = ä
â\u0088\u009Aâ\u0088\u0082 = ö
...
In psql I'm using the following statement to convert â\u0088\u009Aº
to ü
:
SELECT COUNT(*) FROM sbl_students WHERE lastname ~* E'.*â\\u0088\\u009Aº.*';
=> 61
UPDATE sbl_students SET lastname = regexp_replace(lastname, E'â\\u0088\\u009Aº', 'ü', 'g');
SELECT COUNT(*) FROM sbl_students WHERE lastname ~* E'.*â\\u0088\\u009Aº.*';
=> 0
I tried to implement this into a php script, but for some reason it doesn't work. Nothing is replaced and no error occurs.
$dbh = pg_connect("dbname=iserv user=sbl");
$query = "UPDATE sbl_students SET lastname = regexp_replace(lastname, E'â\\u0088\\u009Aº', 'ü', 'g');";
pg_query($dbh, $query);
pg_close($dbh);
I already tried multiple variations like regexp_replace(lastname, E'â\u0088\u009Aº', 'ü', 'g')
or regexp_replace(lastname, 'â\u0088\u009Aº', 'ü', 'g')
but non of them worked.
Can anybody help me out? I don't understand why the query works great in psql but doesn't do anything in php.
Thanks!
Postgresql 8.4
PHP 5.3.3-7+squeeze16 with Suhosin-Patch (cli)