PHP html_entity_decode和修剪混乱

I'm trying to use strip_tags and trim to detect if a string contains empty html?

$description = '<p>&nbsp;</p>';

$output = trim(strip_tags(html_entity_decode($description, ENT_QUOTES, 'UTF-8')));

var_dump($output);

string 'Â ' (length=2)

My debug to try figure this out:

$description = '<p>&nbsp;</p>';

$test = mb_detect_encoding($description);
$test .= "
";
$test .= trim(strip_tags(html_entity_decode($description, ENT_QUOTES, 'UTF-8')));
$test .= "
";
$test .= html_entity_decode($description, ENT_QUOTES, 'UTF-8');

file_put_contents('debug.txt', $test);

Output: debug.txt

ASCII
 
<p> </p>

写回答
好问题 0 提建议
追加酬金
关注问题
分享
邀请回答
编辑收藏删除结题
收藏举报

1条回答默认最新

关注

码龄粉丝数原力等级 --

被采纳

被点赞

采纳率
duangu1033 2015-11-03 12:07
关注
If you use var_dump(urlencode($output)) you'll see that it outputs string(6) "%C2%A0" hence the charcodes are 0xC2 and 0xA0. These two charcodes are unicode for "non-breaking-space". Make sure your file is saved in UTF-8 format and your HTTP headers are UTF-8 format.

That said, to trim this character you can use regex with the unicode modifier (instead of trim):

DEMO:

<?php $description = '<p> </p>'; $output = trim(strip_tags(html_entity_decode($description, ENT_QUOTES, 'UTF-8'))); var_dump(urlencode($output)); // string(6) "%C2%A0" // ------- $output = preg_replace('~^\s+|\s+$~', '', strip_tags(html_entity_decode($description, ENT_QUOTES, 'UTF-8'))); var_dump(urlencode($output)); // string(6) "%C2%A0" // ------- $output = preg_replace('~^\s+|\s+$~u', '', strip_tags(html_entity_decode($description, ENT_QUOTES, 'UTF-8'))); // Unicode! -----------------------^ var_dump(urlencode($output)); // string(0) ""

Regex autopsy:

~ - the regex modifier delimiter - must be before the regex, and then before the modifiers

^\s+ - the start of the string immediately followed by one or more whitespaces (one or more whitespace characters in the start of the string) - (^ means start of the string, \s means a whitespace character, + means "matched 1 to infinity times")

| - OR

\s+$ - one or more whitespace characters immediately followed by the end of the string (one or more whitespace characters in the end of the string)

~ - the ending regex modifier delimiter

u - the regex modifier - here using the unicode modifier (PCRE_UTF8) to make sure we replace unicode whitespace characters.
本回答被题主选为最佳回答 , 对您是否有帮助呢?

解决无用
评论打赏
分享
举报

评论

按下Enter换行，Ctrl+Enter发表内容

报告相同问题？

关注问题

html_entity_decode终止？ html json php
2016-04-25 12:09

回答 1 已采纳 If you ever accept raw HTML from an outside source to embed into your site, you should always, alw
PHP - html_entity_decode没有解码所有内容 html php
2014-01-05 21:26

回答 1 已采纳 I've tried to reproduce your issue with this simple bit of PHP: <?php // Make sure our clien
libxml_set_external_entity_loader在httpd中不起作用 apache php
2016-07-31 11:32

回答 1 已采纳 Rebuid httpd with --enable-mpms-shared=all and then add LoadModule mpm_prefork_module modules/mod_
php中html_entity_decode实现HTML实体转义
2021-01-02 19:39

因此发现了html_entity_decode可以把所有的实体转义回去~ 另外,如果你在浏览器中测试，会发现是转义回去的，这是因为浏览器自动给处理了。实际上是没有转回去的，可以到命令行试试哦~~ html_entity_decode:把所有的...
PHP - 处理缺少分号的HTML实体 php
2016-09-28 18:42

回答 2 已采纳 It seems you just want to match &# followed with 4 digits that are not followed with ;. Use '~&#\
PHP preg_replace＆nbsp; html php
2014-08-29 06:06

回答 2 已采纳 I think the problem is quite simply that highlight_string() is outputting its result immediately,
如何在php中打印HTML代码 html mysql php
2016-03-09 12:18

回答 3 已采纳 its urlencoded html Check this: urlencode($userinput) outputs: %3Ch1%3E%3Cspan+style%3D%22color%
PHP html_entity_decode()函数讲解
2020-12-20 06:46

PHP html_entity_decode() 函数实例把 HTML 实体转换为字符： <?php $str = "<© W3CSçh°°¦§>"; echo html_entity_decode($str); ?> 上面代码的 HTML 输出如下（查看...
Symfony2 LexikFormFilterBundle：filter_entity的空值导致表单错误 php symfony
2015-04-17 09:43

回答 1 已采纳 User wcluijt's fix in https://github.com/symfony/symfony/issues/14393#issuecomment-94996862 fixed
strip_tags + html实体只获取数字 php
2019-06-19 12:34

回答 3 已采纳 You need to remove also the special pieces of text used to define entities, so you need at least a
PHP5.4“parse_str”意外行为 php
2014-11-15 14:46

回答 1 已采纳 Notice how your strings are split on both & and a? I think the configuration setting arg_separato
htmlspecialchars_decode 与 html_entity_decode
2017-05-27 22:16

Goith的博客数据库的数据如果存的是实体的话，读取显示的时候就需要用到这两个函数，但是html_entity_decode函数有解析乱码的时候，而且用这个函数的...html_entity_decode(PHP 4 >= 4.3.0, PHP 5)html_entity_decode — Convert
php html_entity_decode使用总结
2016-08-14 17:23

wangyibo5843的博客在处理网页字符串的时候，尤其是做爬虫类的应用时，经常会涉及到要处理的字符串中包含html标签，现在对这类字符串的处理做一个小的总结：有时候获取到的字符串中有html标签，在入库的时候出于安全的考虑通常会对...
html_entity_decode() 函数
2019-04-29 09:50

冷月醉雪的博客查看更多 https://www.yuque.com/docs/share/14e1f651-359b-42ee-a7fc-31ab38ab2448
PHP html_entity_decode() 函数
2016-09-12 15:20

afterrains的博客 html_entity_decode() 函数把 HTML 实体转换为字符。 html_entity_decode() 函数是 htmlentities() 函数的反函数。
没有解决我的问题, 去提问

悬赏问题

¥15 乌班图ip地址配置及远程SSH
¥15 怎么让点阵屏显示静态爱心，用keiluVision5写出让点阵屏显示静态爱心的代码，越快越好
¥15 PSPICE制作一个加法器
¥15 javaweb项目无法正常跳转
¥15 VMBox虚拟机无法访问
¥15 skd显示找不到头文件
¥15 机器视觉中图片中长度与真实长度的关系
¥15 fastreport table 怎么只让每页的最下面和最顶部有横线
¥15 java 的protected权限，问题在注释里
¥15 这个是哪里有问题啊？

PHP html_entity_decode和修剪混乱

1条回答 默认 最新

悬赏问题

1条回答默认最新