I'm processing some text files with Spanish text in php with eclipse-php on my Mac OS X 10. I have the encoding set to UTF-8, and everything works great except for one small problem. All of the ¡
(upside-down exclamation marks) are replaced with � �
(two black diamonds with questions marks separated by a space) in the output text file. None of the other characters (¿ñáéíóúü
) are giving me any trouble. I had a similar problem with my Windows Vista machine (it would replace all ¡
with é
). Any ideas why this one character is bugging out in UTF-8 and how I can fix it?
Here's the code I'm using. I didn't include it originally because it is so long and I'm not sure where the problem lies. As you can see I've tried to incorporate shiplu.mokadd.im's suggestion, but I'm still getting the � �
.
<?php
ini_set("auto_detect_line_endings", true);
$sourceH = fopen("MainInput.txt", "r") or die("Can't open MainInput.txt.");
$sourceData = array();
$tracker = 0;
while (!feof($sourceH)){
$sourceData[$tracker] = fgets($sourceH);
$sourceData[$tracker] = preg_split("/\t/", $sourceData[$tracker]);
$tracker++;
}
$i = $tracker--;
$chars_hi = 'ABCDEFGHIJKLMNÑOPQRSTUVWXYZÁÉÍÓÚÜ';
$chars_lo = 'abcdefghijklmnñopqrstuvwxyzáéíóúü';
$characters = "ABCDEFGHIJKLMNÑOPQRSTUVWXYZÁÉÍÓÚÜabcdefghijklmnñopqrstuvwxyzáéíóúü1234567890'-";
function lowercase($s) {
global $chars_hi, $chars_lo;
return strtr($s, $chars_hi, $chars_lo);
}
$myNewFile = "Processing/Prepared.txt";
$fhNew = fopen($myNewFile, 'w') or die("can't open Prepared
");
$newText = "";
for ($n = 1; $n < $i; $n++) {
$myFile = $sourceData[$n][1];
$fh = fopen($myFile,'r') or die("can't open file ".$sourceData[$n][1]."
");
fwrite($fhNew, "
StartFile ".$sourceData[$n][0]."
");
$position = 0;
$speaker = ">>u";
while (!feof($fh)){
$newText = fgets($fh);
$isLast = false;
$isFirst = true;
$new = "";
if (mb_strpos($newText, ">> i") !== false or mb_strpos($newText, ">>i") !== false or mb_strpos($newText, ">i") !== false or mb_strpos($newText, "> i") !== false) {
$speaker = ">>i";
}
elseif (mb_strpos($newText, ">> s") !== false or mb_strpos($newText, ">>s") !== false or mb_strpos($newText, ">s") !== false or mb_strpos($newText, "> s") !== false) {
$speaker = ">>s";
}
for ($in = 0; $in < mb_strlen($newText); $in++) {
if (mb_strpos($characters, $newText[$in]) !== false) {
if ($isFirst == true) {
$new = $new." ".$newText[$in];
$isFirst = false;
$isLast = true;
}
else {
$new = $new.$newText[$in];
}
}
elseif ($isLast == true) {
$isLast = false;
$isFirst = true;
$new = $new." ".($in + $position)." ".$speaker." ".$newText[$in];
}
else {
$new = $new.$newText[$in];
}
}
$position += mb_strlen($newText);
$newText = $new;
$newText = lowercase($newText);
fwrite($fhNew, $newText."
");
}
fclose($fh);
}
fclose($fhNew);
?>