dpy15285 2013-05-05 03:00
浏览 52

PHP正则表达式在DOM中查找并替换url属性

Currently I have the following code:

    //loop here 
    foreach ($doc['a'] as $link) {
        $href = pq($link)->attr('href');                
        if (preg_match($url,$href))
        {
            //delete matched string and append custom url to href attr
        }       
        else
        {
            //prepend custom url to href attr
        }
    }
    //end loop

Basically I've fetched vial curl an external page. I need to append my own custom URL to each href link in the DOM. I need to check via regex if each href attr already has a base url e.g. www.domain.com/MainPage.html/SubPage.html

If yes, then replace the www.domain.com part with my custom url.

If not, then simply append my custom url to the relative url.

My question is, what regex syntax should I use and which php function? Is preg_replace() the proper function for this?

Cheers

  • 写回答

1条回答 默认 最新

  • douguachi0056 2013-05-05 03:05
    关注

    You should use internals as opposed to REGEX whenever possible, because often the authors of those functions have considered edge cases (or read the REALLY long RFC for URLs that details all of the cases). For you case, I would use parse_url() and then http_build_url() (note that the latter function needs PECL HTTP, which can be installed by following the docs page for the http package):

    $href = 'http://www.domain.com/MainPage.html/SubPage.html';
    $parts = parse_url($href);
    
    if($parts['host'] == 'www.domain.com') {
        $parts['host'] = 'www.yoursite.com';
    
        $href = http_build_url($parts);
    }
    
    echo $href; // 'http://www.yoursite.com/MainPage.html/SubPage.html';
    

    Example using your code:

    foreach ($doc['a'] as $link) {
        $urlParts = parse_url(pq($link)->attr('href'));               
    
        $urlParts['host'] = 'www.yoursite.com'; // This replaces the domain if there is one, otherwise it prepends your domain
    
        $newURL = http_build_url($urlParts);
    
        pq($link)->attr('href', $newURL);
    }
    
    评论

报告相同问题?

悬赏问题

  • ¥15 c程序不知道为什么得不到结果
  • ¥40 复杂的限制性的商函数处理
  • ¥15 程序不包含适用于入口点的静态Main方法
  • ¥15 素材场景中光线烘焙后灯光失效
  • ¥15 请教一下各位,为什么我这个没有实现模拟点击
  • ¥15 执行 virtuoso 命令后,界面没有,cadence 启动不起来
  • ¥50 comfyui下连接animatediff节点生成视频质量非常差的原因
  • ¥20 有关区间dp的问题求解
  • ¥15 多电路系统共用电源的串扰问题
  • ¥15 slam rangenet++配置