dongshungou7699 2013-07-16 04:21
浏览 119
已采纳

PHP使用DOM获取锚点并修改它们

I have a string of HTML and I need check whether the href attributes of any anchors contain a certain link pattern. If they match a certain pattern I need to modify them.

Here's a sample HTML string:

<p>Disculpa, pero esta entrada está disponible sólo en <a href="http://www.example.com/static/?json=get_page&amp;post_type=page&amp;slug=sample-page&amp;lang=ru">Pусский</a> y <a href="http://www.example.com/static/?json=get_page&amp;post_type=page&amp;sample-page&amp;lang=en">English</a>.</p>

So the URLs in question take the following pattern

http://www.example.com/static/?json=get_page&post_type=page&slug=sample-page&lang=ru

Where the lang query attribute is variable in its value.

If a href matching that pattern is found I need to change it to:

http://www.example.com/ru/sample-page

So I need to remove 'static' and replace it with the value of the lang attribute, and I need to append the value of the 'slug' attribute to the end of the URL.

Sadly I'm getting confounded at the first step so I haven't even been able to test out methods of parsing the URLs and replacing them with the new value.

    $html = '<p>Disculpa, pero esta entrada está disponible sólo en <a href="http://www.example.com/static/?json=get_page&amp;post_type=page&amp;slug=sample-page&amp;lang=ru">Pусский</a> y <a href="http://www.example.com/static/?json=get_page&amp;post_type=page&amp;sample-page&amp;lang=en">English</a>.</p>';
$dom = new DOMDocument;
    // The UTF-8 encoding is necessary
$dom->loadHTML(mb_convert_encoding($html, 'HTML-ENTITIES', 'UTF-8'));
$anchors = $dom->getElementsByTagName('a');

In theory from this point on I'd loop through the anchors found and do stuff, but if I var_dump the $anchors variable I just get:

object(DOMNodeList)#66 (0) { }

So I can't even proceed further!

Any idea what's causing the DOM to fail to collect the anchors?

After that any suggestions on how to best identify if the anchor contains the URL pattern, change it and return the new modified HTML?

Update 1

So it turns out that there's a PHP bug pre 5.4.1 which prevents var_dump from displaying the contents of the DOMNodeList. I can find values with

foreach ($anchors as $anchors) {
    echo $anchors->nodeValue, PHP_EOL;
}

However I have no idea what the $anchors object really looks like so am running blind. If anyone has any suggestions on how to parse the $anchors and modify them as originally mentioned that would be hugely appreciated (whilst I try to sort out a PHP5.4.1 instance)

  • 写回答

5条回答 默认 最新

  • dqy1265 2013-07-16 05:27
    关注

    I have done a similar thing not long ago. You can iterate over a DOMNodeList and then get the href attribute of the anchor.

    $dom = new DOMDocument;
    $dom->loadHTML($content);
    foreach ($dom->getElementsByTagName('a') as $node) {
        $original_url = $node->getAttribute('href');
        // Do something here
        $node->setAttribute('href', $var);
    }
    $html = $dom->saveHtml();
    
    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论
查看更多回答(4条)

报告相同问题?

悬赏问题

  • ¥15 fluent的在模拟压强时使用希望得到一些建议
  • ¥15 STM32驱动继电器
  • ¥15 Windows server update services
  • ¥15 关于#c语言#的问题:我现在在做一个墨水屏设计,2.9英寸的小屏怎么换4.2英寸大屏
  • ¥15 模糊pid与pid仿真结果几乎一样
  • ¥15 java的GUI的运用
  • ¥15 Web.config连不上数据库
  • ¥15 我想付费需要AKM公司DSP开发资料及相关开发。
  • ¥15 怎么配置广告联盟瀑布流
  • ¥15 Rstudio 保存代码闪退