dongshie8450 2013-12-17 09:06
浏览 109
已采纳

如何使用PHP检测CP437

I am trying to detect the encoding of a given string in order to convert it later on to utf-8 using iconv. I want to restrict the set of source encodings to utf8, iso8859-1, windows-1251, CP437

//...
$acceptedEncodings = array('utf-8',
    'iso-8859-1',
    'windows-1251'
);

$srcEncoding = mb_detect_encoding($content, $acceptedEncodings, true);

if($srcEncoding)
{
    $content = iconv($srcEncoding, 'UTF-8', $content);
}
//...

The problem is thet mb_detect_encoding does not seem to accept CP437 as a supported encoding and when I give it a CP437 encoded string this is classified as iso-8859-1 which causes iconv to ignore characters like ü.

My question is: Is there a way to detect CP437 encoding earlier? The conversion from CP437 to UTF-8 using iconv works fine but I just cannot find the proper way to detect CP437.

Thank you very much.

  • 写回答

1条回答 默认 最新

  • dtyz76562 2013-12-17 09:14
    关注

    As has been discussed countless times before: it is fundamentally impossible to distinguish any single-byte encoding from any other single-byte encoding. What you get are a bunch of bytes. In encoding A the byte x42 may map to character X and in encoding B the same byte may map to character Y. But nothing about the blob of bytes you have tells you that, because you only have the bytes. They can mean anything. They're equally valid in all encodings. It's possible to identify more complex multi-byte encodings like UTF-8, since they need to follow more complex internal rules. So it's possible to definitely be able to say This is not valid UTF-8. However, it is impossible to say with 100% certainty This is definitely UTF-8, not ISO-8859.

    You need to have meta data about the content you receive which tells you what encoding the content is in. It's not practical to identify it after the fact. You'd need to employ actual content analysis to figure out which encoding a piece of text makes the most sense in.

    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论

报告相同问题?

悬赏问题

  • ¥15 写一个方法checkPerson,入参实体类Person,出参布尔值
  • ¥15 我想咨询一下路面纹理三维点云数据处理的一些问题,上传的坐标文件里是怎么对无序点进行编号的,以及xy坐标在处理的时候是进行整体模型分片处理的吗
  • ¥15 CSAPPattacklab
  • ¥15 一直显示正在等待HID—ISP
  • ¥15 Python turtle 画图
  • ¥15 关于大棚监测的pcb板设计
  • ¥15 stm32开发clion时遇到的编译问题
  • ¥15 lna设计 源简并电感型共源放大器
  • ¥15 如何用Labview在myRIO上做LCD显示?(语言-开发语言)
  • ¥15 Vue3地图和异步函数使用