dpzbzp8728 2011-03-10 03:31
浏览 52
已采纳

PHP - 如何检测编码?

I'm using Amazon's API to obtain the description of books. The API returns XML responses and the description is marked up (with HTML) very poorly. To deal with this poorly marked up description, which oftentimes breaks the layout of my site, I'm trying to use HTML Tidy to "clean it up."

In order to prevent "weird" characters from being displayed on my web page, I think I need to tell Tidy what the input encoding is and what the desired output encoding is. I know I want the output to be UTF8. However, I'm not sure how to determine the encoding of the input (Amazon's book description).

I've tried something like this:

mb_detect_encoding($amazon_description);

It's helped, but I'm still occasionally getting weird characters (a black diamond with a question mark in it: �). My guess is that I'm not detecting the encoding properly.

Any suggestions what I need to do?

EDIT:

This is my current solution:

$sanitized_amazon_markup = preg_replace('/[^\w`~!@#$%^&*()-=_+[\]{}|;\':",.\/<>? ]/', '', $sanitized_amazon_markup);

I'm not sure about this as this may delete stuff that I should be keeping.

  • 写回答

1条回答 默认 最新

  • doukekui0914 2011-03-10 08:49
    关注

    Can you provide your tidy repairString call?

    If you tried to use input-encoding and output-encoding from tidy options, try to not use these and use the third argument or repairString instead, something like this :

    $oTidy = new tidy();
    $page_content = $oTidy->repairString($page_content,
        array("show-errors" => 0, "show-warnings" => false),
        "utf8"
    );
    

    Edit :

    After doing some tests, what I said before cannot work if you don't have utf8 encoding in $page_content already before calling repairString

    But you will mostly end up with ISO-8859-1 (latin1) encoding if not UTF-8 already.

    May I suggest you try :

    $charset = mb_detect_encoding($amazon_description, 'UTF-8, ISO-8859-1');
    if ($charset == "ISO-8859-1") {
        $amazon_description = utf8_encode($amazon_description);
    }
    $oTidy = new tidy();
    $amazon_description = $oTidy->repairString($amazon_description,
        array("show-errors" => 0, "show-warnings" => false),
        "utf8"
    );
    
    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论

报告相同问题?

悬赏问题

  • ¥15 微带串馈天线阵列每个阵元宽度计算
  • ¥15 keil的map文件中Image component sizes各项意思
  • ¥30 BC260Y用MQTT向阿里云发布主题消息一直错误
  • ¥20 求个正点原子stm32f407开发版的贪吃蛇游戏
  • ¥15 划分vlan后,链路不通了?
  • ¥20 求各位懂行的人,注册表能不能看到usb使用得具体信息,干了什么,传输了什么数据
  • ¥15 Vue3 大型图片数据拖动排序
  • ¥15 Centos / PETGEM
  • ¥15 划分vlan后不通了
  • ¥20 用雷电模拟器安装百达屋apk一直闪退