doumeng1143 2012-11-23 12:31
浏览 99
已采纳

提取源URL和字符串中的锚文本

I am trying to extract data from a series of string but no luck. in the example code below, I tried using preg_split but its not giving me the result I want.

using the code below:

<?php
$str = '<a href="https://rads.stackoverflow.com/amzn/click/com/B008EYEYBA" rel="nofollow noreferrer">Nike Air Jordan SC-2 Mens Basketball Shoes 454050-035</a><img src="http://www.assoc-amazon.com/e/ir?t=mytwitterpage-20&l=as2&o=1&a=B008EYEYBA" width="1" height="1" border="0" alt="" style="border:none !important; margin:0px !important;" />
';
$chars = preg_split('/ /', $str, -1, PREG_SPLIT_OFFSET_CAPTURE);

echo '<pre>';
print_r($chars);
echo '<pre>';
?>

gives the result:

Array
(
    [0] => Array
        (
            [0] =>  0
        )

    [1] => Array
        (
            [0] => href="https://rads.stackoverflow.com/amzn/click/com/B008EYEYBA" rel="nofollow noreferrer">Nike
            [1] => 3
        )

    [2] => Array
        (
            [0] => Air
            [1] => 167
        )

    [3] => Array
        (
            [0] => Jordan
            [1] => 171
        )

    [4] => Array
        (
            [0] => SC-2
            [1] => 178
        )

    [5] => Array
        (
            [0] => Mens
            [1] => 183
        )

    [6] => Array
        (
            [0] => Basketball
            [1] => 188
        )

    [7] => Array
        (
            [0] => Shoes
            [1] => 199
        )

    [8] => Array
        (
            [0] => 454050-035 205
        )

    [9] => Array
        (
            [0] => src="http://www.assoc-amazon.com/e/ir?t=mytwitterpage-20&l=as2&o=1&a=B008EYEYBA"
            [1] => 224
        )

    [10] => Array
        (
            [0] => width="1"
            [1] => 305
        )

    [11] => Array
        (
            [0] => height="1"
            [1] => 315
        )

    [12] => Array
        (
            [0] => border="0"
            [1] => 326
        )

    [13] => Array
        (
            [0] => alt=""
            [1] => 337
        )

    [14] => Array
        (
            [0] => style="border:none
            [1] => 344
        )

    [15] => Array
        (
            [0] => !important;
            [1] => 363
        )

    [16] => Array
        (
            [0] => margin:0px
            [1] => 375
        )

    [17] => Array
        (
            [0] => !important;"
            [1] => 386
        )

    [18] => Array
        (
            [0] => />

            [1] => 399
        )

)

note in array1, the word "Nike is included when I only need is just the URL.

[1] => Array
        (
            [0] => href="https://rads.stackoverflow.com/amzn/click/com/B008EYEYBA" rel="nofollow noreferrer">Nike
            [1] => 3
        )

actually, my ultimate goal in extracting $str is just to output the source URL and the achor text in a separate array like so:

URL:

http://www.amazon.com/gp/product/B008EYEYBA/ref=as_li_ss_tl?ie=UTF8&camp=1789&creative=390957&creativeASIN=B008EYEYBA&linkCode=as2&tag=mytwitterpage-20

anchor text:

Nike Air Jordan SC-2 Mens Basketball Shoes 454050-035

any idea how I can accomplish this is greatly appreciated.

  • 写回答

2条回答 默认 最新

  • duanli8577 2012-11-23 13:04
    关注

    Using a regular expression to parse html is a bad practice. PHP has DOM extension for that. You simply cannot build a universal regex which is going to work for any html you might encounter. DOM approach is much more extendable.

    $string = '<a href="https://rads.stackoverflow.com/amzn/click/B008EYEYBA">Nike Air Jordan SC-2 Mens Basketball Shoes 454050-035</a><img src="http://www.assoc-amazon.com/e/ir?t=mytwitterpage-20&l=as2&o=1&a=B008EYEYBA" width="1" height="1" border="0" alt="" style="border:none !important; margin:0px !important;" />';
    $dom = new DOMDocument();
    libxml_use_internal_errors(true);
    $dom->loadHTML($string);
    libxml_clear_errors();
    $elementA = $dom->getElementsByTagName('a')->item(0);
    $aText = $elementA->nodeValue;
    $aLink = $elementA->getAttribute('href');
    echo $aLink . "
    " . $aText;
    
    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论
查看更多回答(1条)

报告相同问题?

悬赏问题

  • ¥15 uniapp uview http 如何实现统一的请求异常信息提示?
  • ¥15 有了解d3和topogram.js库的吗?有偿请教
  • ¥100 任意维数的K均值聚类
  • ¥15 stamps做sbas-insar,时序沉降图怎么画
  • ¥15 买了个传感器,根据商家发的代码和步骤使用但是代码报错了不会改,有没有人可以看看
  • ¥15 关于#Java#的问题,如何解决?
  • ¥15 加热介质是液体,换热器壳侧导热系数和总的导热系数怎么算
  • ¥100 嵌入式系统基于PIC16F882和热敏电阻的数字温度计
  • ¥20 BAPI_PR_CHANGE how to add account assignment information for service line
  • ¥500 火焰左右视图、视差(基于双目相机)