dp198879 2017-06-06 11:17
浏览 43

php错误编码为utf8从pdf中提取的文本

I need to extract a text in a php variable from a pdf file, i used pdf2text for this, but i have problems when i try to convert the string to utf-8 target.

Also if someone knows a better way to delete the spaces and line breacks of the string, i would be grateful.

this is the code i have used:

header('Content-type: text/html; charset=utf-8');
mb_internal_encoding('UTF-8');

mb_http_output('UTF-8');

include('pdftophp.php');
$doc = new PDF2Text();
$doc->setFilename('pdf/prueba.pdf'); 
$doc->decodePDF();
$texto = $doc->output();

$resultado = "";
for ($i=0; $i < strlen($texto) ; $i++) { 
    if (substr($texto,$i,1) != " " && substr($texto,$i,1) != "
"){
        $resultado.= substr($texto,$i,1);
    }
}

echo $resultado;
  • 写回答

0条回答 默认 最新

    报告相同问题?

    悬赏问题

    • ¥15 不是,这到底错哪儿了😭
    • ¥15 2020长安杯与连接网探
    • ¥15 关于#matlab#的问题:在模糊控制器中选出线路信息,在simulink中根据线路信息生成速度时间目标曲线(初速度为20m/s,15秒后减为0的速度时间图像)我想问线路信息是什么
    • ¥15 banner广告展示设置多少时间不怎么会消耗用户价值
    • ¥16 mybatis的代理对象无法通过@Autowired装填
    • ¥15 可见光定位matlab仿真
    • ¥15 arduino 四自由度机械臂
    • ¥15 wordpress 产品图片 GIF 没法显示
    • ¥15 求三国群英传pl国战时间的修改方法
    • ¥15 matlab代码代写,需写出详细代码,代价私