dongshenjie3055 2018-10-16 19:32
浏览 406
已采纳

保持现有HTML实体不变,但转换双引号和单引号

I'm using PHP code to generate my meta description tag, like so:

<meta name="description" content="<?php
echo $this->utf->clean_string(word_limiter(strip_tags(trim($paperResult['file_content'])),27));
?>


Here's an example of the meta description output:

<meta name="description" content="blah blah &#182; &#8230; blah blah "words in quotation marks" blah blah "more words in quotation marks" blah blah" />

The two HTML entities in that example meta description are a paragraph sign (&#182;) followed by an ellipsis (&#8230;). They are already in HTML entity form in the source text, so I want them to remain unchanged. The problem is that I also need the quotation marks within the description to convert to &quot; in order to prevent the meta tag from breaking. Every combination/configuration that I try either does not work or breaks my site because I'm getting the code wrong. For example, when I try the following code, the quotation marks convert to their HTML entity, as desired, but the paragraph symbol and ellipsis entities break because the ampersand character at the beginning of the existing HTML entities gets converted to &amp;. That leaves me with a broken &#182; (&amp;#182;) and a broken &#8230; (&amp;#8230;) :

 echo $this->utf->clean_string(word_limiter(htmlspecialchars(strip_tags(trim($paperResult['file_content']))),27));

I've been trying—literally, for days—to figure this out. I've searched extensively in Stack Overflow, to no avail. I just need the existing HTML entities to remain unchanged and quotation marks to be converted to their HTML entity (&quot;). I have studied the ENT_QUOTES option and I know that the solution probably exists therein, but I can't figure out how to incorporate it into my particular line of code. I'm hoping that you PHP gurus will have mercy on this tortured soul! I'd truly appreciate your help.

Thank you!

  • 写回答

2条回答 默认 最新

  • douyou9923 2018-10-16 19:43
    关注

    If it's the contents of the "content" attribute you can do this

    $str = 'blah blah &#182; &#8230; blah blah "words in quotation marks" blah blah "more words in quotation marks" blah blah';
    echo htmlentities($str, ENT_QUOTES, "UTF-8", false);
    

    Output

    blah blah &#182; &#8230; blah blah &quot;words in quotation marks&quot; blah blah &quot;more words in quotation marks&quot; blah blah
    

    Sandbox

    The key thing here is the 4th argument

    string htmlentities ( string $string [, int $flags = ENT_COMPAT | ENT_HTML401 [, string $encoding = ini_get("default_charset") [, bool $double_encode = TRUE ]]] )

    Specifically

    double_encode When double_encode is turned off PHP will not encode existing html entities. The default is to convert everything.

    That way it doesn't double encode the ampersand.

    htmlspecialchars also has a double encode argument.

    htmlspecialchars ( string $string [, int $flags = ENT_COMPAT | ENT_HTML401 [, string $encoding = ini_get("default_charset") [, bool $double_encode = TRUE ]]] )

    $str = 'blah blah &#182; &#8230; blah blah "words in quotation marks" blah blah "more words in quotation marks" blah blah';
    echo htmlspecialchars($str, ENT_QUOTES, "UTF-8", false);
    

    Output

    blah blah &#182; &#8230; blah blah &quot;words in quotation marks&quot; blah blah &quot;more words in quotation marks&quot; blah blah
    

    Sandbox

    If it's the whole tag, then you'll have to pull out the contents and modify it and then replace it so as to preserve the < and >, but it's not clear in the question if that is the case.

    PS there is not a whole lot of difference between htmlspecialchars and htmlentities, it mainly has to do with é accute and other accent things like that, htmlentities encodes those too, if I remember correctly.

    UPDATE

    I need the solution to be incorporated into my particular format of PHP code (i.e., a single line of PHP that maintains my existing functions/functionality), as miken32 brilliantly did above

    To put it in your code,

    <meta name="description" content="<?=htmlspecialchars(word_limiter(trim($paperResult['file_content']),27),ENT_QUOTES,"UTF-8",false);?>"/>
    

    UPDATE2

    With preg_replace('/[ ]+/', ' ', $string) removes or one or more times +. But it may be better to do it this way preg_replace(['/[ ]+/', '/\s+/'], ' ', $string). Which would remove run on spaces too.

     <meta name="description" content="<?=htmlspecialchars(word_limiter(preg_replace('/[
    ]+/', ' ', trim($paperResult['file_content'])),27),ENT_QUOTES,"UTF-8",false);?>"/>
    

    Basically what it amounts to is anything that makes the text shorter you probably want to do before word_limiter (whatever that is). And any thing that makes it longer, like changing " to &quote; you probably want to do after (maybe). It just seems more logical to me.

    Cheers!

    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论
查看更多回答(1条)

报告相同问题?

悬赏问题

  • ¥15 怎么获取下面的: glove_word2id.json和 glove_numpy.npy 这两个文件
  • ¥15 js调用html页面需要隐藏某个按钮
  • ¥15 ads仿真结果在圆图上是怎么读数的
  • ¥20 Cotex M3的调试和程序执行方式是什么样的?
  • ¥20 java项目连接sqlserver时报ssl相关错误
  • ¥15 一道python难题3
  • ¥15 牛顿斯科特系数表表示
  • ¥15 arduino 步进电机
  • ¥20 程序进入HardFault_Handler
  • ¥15 oracle集群安装出bug