dtc99987
dtc99987
2014-10-10 13:00
浏览 180
已采纳

PHP DOMXPath使用双引号失败,带单引号

I wrote a little script that extracts information from a web site using PHP's DOMXPath class.
I query for <div class="sku" /> and execute a substring-before on the result. The result contains text, non breaking spaces, a line break and more text.
So what I'm trying to do is cut before the &nbsp;&nbsp; . It works fine when I use the following query:

$query = "substring-before(//div[@class='sku'],'\xC2\xA0\xC2\xA0
')";

but fails as soon as I change the quotes (which shouldn't make any difference):

$query = 'substring-before(//div[@class="sku"],"\xC2\xA0\xC2\xA0
")';

or

$query = 'substring-before(//div[@class=\'sku\'],\'\xC2\xA0\xC2\xA0
\')';

How is this possible and how can I overcome this?

Live example here: http://codepad.viper-7.com/R1rCaj

  • 点赞
  • 写回答
  • 关注问题
  • 收藏
  • 邀请回答

2条回答 默认 最新

  • dongye1934
    dongye1934 2014-10-10 15:03
    已采纳

    The style of quotes makes a difference because when a string is enclosed in double-quotes PHP will interpret more escape sequences for special characters - including what you're using for non-breaking space \xC2\xA0, carriage return , and newline .

    When you have these enclosed in single-quotes '\xC2\xA0 ', like in your second two queries, PHP treats them as those literal characters - backslash, x, C, 2... etc.


    A little extra syntax highlighting may help show this off, escape sequences in orange:

    enter image description here


    If your string already has what would be escape sequences in it as literal characters, and there's no way to get that corrected*, you're in the kinda dirty position of replacing them yourself.

    This preg_replace_callback() will take care of the sort of sequences in your example, and it's trivial to extend to the rest of the escape sequences supported by double-quotes:

    // Known good.
    $query1 = "substring-before(//div[@class='sku'],'\xC2\xA0\xC2\xA0
    ')";
    
    // Known bad.
    $query2 = 'substring-before(//div[@class=\'sku\'],\'\xC2\xA0\xC2\xA0
    \')';
    
    $query2 = preg_replace_callback(
        '/\\\\(?:[rn]|(?:x[0-9A-Fa-f]{1,2}))/',
        function ($matches) {
            switch (substr($matches[0], 0, 2)) {
                case '':
                    return "";
                case '
    ':
                    return "
    ";
                case '\x':
                    return hex2bin(substr($matches[0], 2));
                }
        },
        $query2
    );
    
    var_dump($query1 === $query2); // Now equal?
    

    Output:

    bool(true)
    

    (*Really, you should get this fixed at the source.)

    点赞 评论
  • dongliping003116
    dongliping003116 2014-10-10 13:26

    you can do this easy with simple_html_dom for download : http://sourceforge.net/projects/simplehtmldom/files/ Manual : http://simplehtmldom.sourceforge.net/manual.htm

        <?php
        // include simple html dom library
        include('./lib/simple_html_dom.php');
        $url="http://www.vosteen-shop.de/p-261232-edelstahl-herz-acero-zum-hngen-lnge-10cm-breite-10cm-silber-glanz.aspx";
        // get html in $html var
        $html=file_get_html($url);
        // find your class div.sku (plaintext) or you can get (innertext)
    $results=$html->find('div.sku',0)->innertext;
    $explode=explode("<b",$results);
    $results=trim($explode[0]);
    echo $results ;
        ?>
    
    点赞 评论

相关推荐