duan2428 2018-08-10 10:52
浏览 70
已采纳

PHP CURL - 当你只知道id时刮掉seo url

I want to use curl to scrape multiple pages of an online shop. The problem that i have is that the urls are seo friendly - or something like that - and they look like this:

https://shopname.com/product-id-title-of-a-product.html

If i use the entire url it works and i'm able to get the data that i'm looking for but the only variable in that title that i know is the ID:

https://shopname.com/product-294

Is there a way to scrape that url in this case?

The url that only has the ID in it does REDIRECT to the full url.

And this is the code that i'm using:

$curl = curl_init();
$url = 'https://shopname.com/product-294';

curl_setopt($curl, CURLOPT_URL, $url);
curl_setopt($curl, CURLOPT_RETURNTRANSFER, true);

$result = curl_exec($curl);
  • 写回答

2条回答 默认 最新

  • douya7282 2018-08-10 11:16
    关注

    Curl provides the option CURLOPT_FOLLOWLOCATION.

    curl_setopt($curl, CURLOPT_FOLLOWLOCATION, true);
    

    The documentation states:

    TRUE to follow any "Location: " header that the server sends as part of the HTTP header (note this is recursive, PHP will follow as many "Location: " headers that it is sent, unless CURLOPT_MAXREDIRS is set).

    Therefore it would be advisable to set CURLOPT_MAXREDIRS aswell, for example to limit the execution to 1 redirection:

    curl_setopt($curl, CURLOPT_MAXREDIRS, 1);
    

    Like this you should be automatically be redirected to the original url without any further programming.

    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论
查看更多回答(1条)

报告相同问题?

悬赏问题

  • ¥15 在若依框架下实现人脸识别
  • ¥15 网络科学导论,网络控制
  • ¥100 安卓tv程序连接SQLSERVER2008问题
  • ¥15 利用Sentinel-2和Landsat8做一个水库的长时序NDVI的对比,为什么Snetinel-2计算的结果最小值特别小,而Lansat8就很平均
  • ¥15 metadata提取的PDF元数据,如何转换为一个Excel
  • ¥15 关于arduino编程toCharArray()函数的使用
  • ¥100 vc++混合CEF采用CLR方式编译报错
  • ¥15 coze 的插件输入飞书多维表格 app_token 后一直显示错误,如何解决?
  • ¥15 vite+vue3+plyr播放本地public文件夹下视频无法加载
  • ¥15 c#逐行读取txt文本,但是每一行里面数据之间空格数量不同