douyanjing0822 2013-05-05 09:27
浏览 47
已采纳

使用CURL更改相对URL [重复]

This question already has an answer here:

I'm trying to scrape some websites using CURL. In order to change the relative URL's I have inserted this:

 $curl_scraped_page = preg_replace("/<head>/i", "<head><base href='$url' />", $curl_scraped_page, 1);

It's working good for most websites but not all of them. For instance this website "NS Website" show's no effect at all, meaning the URL's are completed with my domain as base url: mydomain.com/css.css

This is the complete code Im using:

<?php

$url = $_GET['url'];

$ch = curl_init($url);
curl_setopt($ch, CURLOPT_CONNECTTIMEOUT,2);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
$curl_scraped_page = curl_exec($ch);

$curl_scraped_page = preg_replace("/<head>/i", "<head><base href='$url' />", $curl_scraped_page, 1);

curl_close($ch);

echo $curl_scraped_page;

?>

Live example at phpfiddle

</div>
  • 写回答

1条回答 默认 最新

  • duanqun9740 2013-05-05 09:39
    关注

    Your problem is in the regular expression.

    You are looking for <head>, but the given example's website has a <head profile="http://gmpg.org/xfn/11">.

    Replace your regular expression with :

    $curl_scraped_page = preg_replace("/<head.*>/i", "<head><base href='$url' />", $curl_scraped_page, 1);
    
    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论

报告相同问题?

悬赏问题

  • ¥15 改算法,照着压缩包里边,参考其他代码封装的格式 写到main函数里
  • ¥15 用windows做服务的同志有吗
  • ¥60 求一个简单的网页(标签-安全|关键词-上传)
  • ¥35 lstm时间序列共享单车预测,loss值优化,参数优化算法
  • ¥15 Python中的request,如何使用ssr节点,通过代理requests网页。本人在泰国,需要用大陆ip才能玩网页游戏,合法合规。
  • ¥100 为什么这个恒流源电路不能恒流?
  • ¥15 有偿求跨组件数据流路径图
  • ¥15 写一个方法checkPerson,入参实体类Person,出参布尔值
  • ¥15 我想咨询一下路面纹理三维点云数据处理的一些问题,上传的坐标文件里是怎么对无序点进行编号的,以及xy坐标在处理的时候是进行整体模型分片处理的吗
  • ¥15 一直显示正在等待HID—ISP