dpy15285 2013-05-05 03:00
浏览 52

PHP正则表达式在DOM中查找并替换url属性

Currently I have the following code:

    //loop here 
    foreach ($doc['a'] as $link) {
        $href = pq($link)->attr('href');                
        if (preg_match($url,$href))
        {
            //delete matched string and append custom url to href attr
        }       
        else
        {
            //prepend custom url to href attr
        }
    }
    //end loop

Basically I've fetched vial curl an external page. I need to append my own custom URL to each href link in the DOM. I need to check via regex if each href attr already has a base url e.g. www.domain.com/MainPage.html/SubPage.html

If yes, then replace the www.domain.com part with my custom url.

If not, then simply append my custom url to the relative url.

My question is, what regex syntax should I use and which php function? Is preg_replace() the proper function for this?

Cheers

  • 写回答

1条回答

  • douguachi0056 2013-05-05 03:05
    关注

    You should use internals as opposed to REGEX whenever possible, because often the authors of those functions have considered edge cases (or read the REALLY long RFC for URLs that details all of the cases). For you case, I would use parse_url() and then http_build_url() (note that the latter function needs PECL HTTP, which can be installed by following the docs page for the http package):

    $href = 'http://www.domain.com/MainPage.html/SubPage.html';
    $parts = parse_url($href);
    
    if($parts['host'] == 'www.domain.com') {
        $parts['host'] = 'www.yoursite.com';
    
        $href = http_build_url($parts);
    }
    
    echo $href; // 'http://www.yoursite.com/MainPage.html/SubPage.html';
    

    Example using your code:

    foreach ($doc['a'] as $link) {
        $urlParts = parse_url(pq($link)->attr('href'));               
    
        $urlParts['host'] = 'www.yoursite.com'; // This replaces the domain if there is one, otherwise it prepends your domain
    
        $newURL = http_build_url($urlParts);
    
        pq($link)->attr('href', $newURL);
    }
    
    评论

报告相同问题?

悬赏问题

  • ¥15 乘性高斯噪声在深度学习网络中的应用
  • ¥15 运筹学排序问题中的在线排序
  • ¥15 关于docker部署flink集成hadoop的yarn,请教个问题 flink启动yarn-session.sh连不上hadoop,这个整了好几天一直不行,求帮忙看一下怎么解决
  • ¥30 求一段fortran代码用IVF编译运行的结果
  • ¥15 深度学习根据CNN网络模型,搭建BP模型并训练MNIST数据集
  • ¥15 C++ 头文件/宏冲突问题解决
  • ¥15 用comsol模拟大气湍流通过底部加热(温度不同)的腔体
  • ¥50 安卓adb backup备份子用户应用数据失败
  • ¥20 有人能用聚类分析帮我分析一下文本内容嘛
  • ¥30 python代码,帮调试,帮帮忙吧