doushi6864 2012-08-11 15:41
浏览 38
已采纳

为什么我尝试替换字符串中的字符失败?

I have a string (taken from a MySQL database if it makes any difference) which looks normal enough:

Manufacture: <a href="http://www.x.com/">Blah</a>

The problem is that the space between Manufacture: and the <a> tag has a charcode of 194, not 32 as I would expect.

This is causing a preg_match with the following pattern to fail (please ignore the attempts to parse HTML with regex, I know it's not a good idea but this particular dataset is predictable enough to get away with it):

/Manufacture: *(<a[^>]*>([A-Za-z- 0-9]+)<\/a>)/i

If I replace the rogue space with a normal space character in a text editor and try again, the expression matches as expected, but I need to alter it programatically.

I tried str_replace:

$text = str_replace(chr(194), ' ', $text);

But the preg_match still fails. I then tried preg_replace:

$text = preg_replace('/[\xC2]/', ' ', $text);

But that doesn't work either, even though running that same pattern through preg_match does contain the expected match.

Does anyone have any ideas?

  • 写回答

2条回答 默认 最新

  • douzhang7184 2012-08-11 18:35
    关注

    Can you please check the structure of the MySQL table where you get the contents of $text from? If the collation is utf8_general_ci or something like that then your string most likely contains a double-byte UNICODE character.

    enter image description here

    If that is the case then the PHP function iconv should do the trick. Here's the example from the PHP manual. The IGNORE option should remove the UNICODE character from the string.

    <?php
    $text = "This is the Euro symbol '€'.";
    
    echo 'Original : ', $text, PHP_EOL;
    echo 'TRANSLIT : ', iconv("UTF-8", "ISO-8859-1//TRANSLIT", $text), PHP_EOL;
    echo 'IGNORE   : ', iconv("UTF-8", "ISO-8859-1//IGNORE", $text), PHP_EOL;
    echo 'Plain    : ', iconv("UTF-8", "ISO-8859-1", $text), PHP_EOL;
    
    ?>
    

    The above example will output something similar to:

    Original : This is the Euro symbol '€'.
    TRANSLIT : This is the Euro symbol 'EUR'.
    IGNORE   : This is the Euro symbol ''.
    Plain    :
    Notice: iconv(): Detected an illegal character in input string in .\iconv-example.php on line 7
    This is the Euro symbol '
    
    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论
查看更多回答(1条)

报告相同问题?

悬赏问题

  • ¥15 请教:如何用postman调用本地虚拟机区块链接上的合约?
  • ¥15 为什么使用javacv转封装rtsp为rtmp时出现如下问题:[h264 @ 000000004faf7500]no frame?
  • ¥15 乘性高斯噪声在深度学习网络中的应用
  • ¥15 运筹学排序问题中的在线排序
  • ¥15 关于docker部署flink集成hadoop的yarn,请教个问题 flink启动yarn-session.sh连不上hadoop,这个整了好几天一直不行,求帮忙看一下怎么解决
  • ¥15 深度学习根据CNN网络模型,搭建BP模型并训练MNIST数据集
  • ¥15 C++ 头文件/宏冲突问题解决
  • ¥15 用comsol模拟大气湍流通过底部加热(温度不同)的腔体
  • ¥50 安卓adb backup备份子用户应用数据失败
  • ¥20 有人能用聚类分析帮我分析一下文本内容嘛