I am struggeling around a long time with this and I suspect other users do as well.
First I have to say that I have no alternative to FPDF because I use a lot of other FPDF modules, so please try not to recommend to use another library Like TCPDF.
I really need to make FPDF to be able to handle UTF-8 characters in a stable way.
What I already found out:
There is an extension called UFPDF http://acko.net/blog/ufpdf-unicode-utf-8-extension-for-fpdf/
The extension supports TrueType fonts only for now but it should work for me. The .ttf file has to be converted by a tool called ttf2ufm and the resulting .ufm and the source .ttf are convertet to font.php, font.z and font.ctg.z file by using the given tool makefontuni.php.
So far so good. So I tried to convert the Arial font from my computer. (arial.ttf, arialbd.ttf, arialbi.ttf, ariali.ttf)
It worked and I was able to produce a test.pdf with unicode characters. But the was a error popup shown by AdobeReader which says something like: Bad Parameter - the font ArialMT contains bad /Widths.
I noticed that all characters had the same width (i suspect the default widths) so I tried to debug.
I found out that UPDF adds the widths to the PDF like this:
charnumber [width] charnumber [width]
85 [276] (for the "u" character)
And I found out that some characters had a negative index value:
-70 [266]
The index values are created by ttf2ufm. If i look at the resulting arial.ufm i found entries like this:
U -70 ; WX 450 ; N uni06BE ; G 1003 ; B -70 256 788 1136 ;
I suspected that U is the index in utf-8 table and I modified the makefontuni.php to make it to ignore negative values for U. Created the font.php, font.z and font.ctg.z again and it worked. The Error-Notice was not shown and the characters was shown up with the correct width.
So the first question is: Why does ttf2ufm produce negative values for U? Is this correct? And if it is correct why is the AdobeReader not able to handle it?
I hoped that was all but it is not.
I did some more tests by using the BOLD font and the lower "u" character was shown as a strange sign when using arial bold.
I debugged again and I found this line for the "u" character in arialbd.ufm
U 117 ; WX 611 ; N u ; G 88 ; B 141 -24 1107 1062 ;
I searched for "U 117" in that file and I found another character beginning with "U 117 ;". I already removed it so I cannot post the line here. However this was the wrong char shown in pdf and after removing it the u had been displayed correctly.
So the second question is: What is the reason why ttf2ufm produces a .ufm file with 2 characters with the same index? This happens only for arialbd.ttf not for arial.ttf.
However i solved it for now hoping there are no other double-index-characters.
More issues:
I recognized that the resulting arial.php contains the character widths:
$cw=array(
32=>278, 160=>278, 33=>278, 34=>355, 35=>556, 36=>556,
37=>889, 38=>667, 39=>191, 40=>333, 41=>333, 42=>389, 43=>584,
44=>278, 45=>333, 173=>333, [...]
The arial.php in non-unicode version contains the $cw
array, too. But it uses the character itself as index, not the index number:
$cw=array(
chr(0)=>750,chr(1)=>750,chr(2)=>750,chr(3)=>750,chr(4)=>750,
chr(5)=>750,chr(6)=>750,chr(7)=>750,chr(8)=>750,chr(9)=>750,chr(10)=>750,
chr(11)=>750,chr(12)=>750, [...]
And fpdf.php sometimes tries to access the $cw
value and some other modules do it, too to be able to compute the width of given string. All of this failed for UFPDF.
I tried to fix it by modify fpdf.php and all modules that try to access $cw
like this:
I created a method called charlength
in fpdf class:
function charlength($char)
{
$cw = &$this->CurrentFont['cw'];
return $cw[$char];
}
And made FPDF to call charlength
whenever it wants to access $this->CurrentFont['cw']
:
function GetStringWidth($s)
{
// Get width of a string in the current font
$s = (string)$s;
// $cw = &$this->CurrentFont['cw']; // Old FPDF-Code
$w = 0;
$l = strlen($s);
for($i=0;$i<$l;$i++) {
// $w += $cw[$s[$i]]; // Old FPDF-Code
$w += $this->charlength($s[$i]); // My replacement
}
return $w*$this->FontSize/1000;
}
In ufpdf.php i override the method charlength
like this:
function charlength($char) {
$cw = &$this->CurrentFont['cw'];
$utf8dec = $this->ordutf8($char, $offset);
if(!isset($cw[$utf8dec])) {
return 0;
}
return $cw[$utf8dec];
}
function ordutf8($string, &$offset) {
$string = class_stringTools::utf8_decode($string);
$code = ord(substr($string, $offset,1));
if ($code >= 128) { //otherwise 0xxxxxxx
if ($code < 224) $bytesnumber = 2; //110xxxxx
else if ($code < 240) $bytesnumber = 3; //1110xxxx
else if ($code < 248) $bytesnumber = 4; //11110xxx
else return -1;
$codetemp = $code - 192 - ($bytesnumber > 2 ? 32 : 0) - ($bytesnumber > 3 ? 16 : 0);
for ($i = 2; $i <= $bytesnumber; $i++) {
$offset ++;
$code2 = ord(substr($string, $offset, 1)) - 128; //10xxxxxx
$codetemp = $codetemp*64 + $code2;
}
$code = $codetemp;
}
$offset += 1;
if ($offset >= strlen($string)) $offset = -1;
return $code;
}
The ordutf8
method is from php.net but i had to modify it because i got strage values for $code
one time the value of $code
was 252 which results in an undefined $bytenumber
.
However it seems to work for now but I am not very happy with editing the source of fpdf.php and the source of other modules. And I am wondering that nobody else reports the issues i struggled with.
I know i have written very much but i want to know if everyone had the same issues. What do you think about the last modifications? Do you have some improvements? I really need a stable way to make FPDF to support unicode characters. Please help me.
It is a shame that the author of ufpdf has no time to support this.