duan2428 2018-08-10 10:52
浏览 70
已采纳

PHP CURL - 当你只知道id时刮掉seo url

I want to use curl to scrape multiple pages of an online shop. The problem that i have is that the urls are seo friendly - or something like that - and they look like this:

https://shopname.com/product-id-title-of-a-product.html

If i use the entire url it works and i'm able to get the data that i'm looking for but the only variable in that title that i know is the ID:

https://shopname.com/product-294

Is there a way to scrape that url in this case?

The url that only has the ID in it does REDIRECT to the full url.

And this is the code that i'm using:

$curl = curl_init();
$url = 'https://shopname.com/product-294';

curl_setopt($curl, CURLOPT_URL, $url);
curl_setopt($curl, CURLOPT_RETURNTRANSFER, true);

$result = curl_exec($curl);
  • 写回答

2条回答 默认 最新

  • douya7282 2018-08-10 11:16
    关注

    Curl provides the option CURLOPT_FOLLOWLOCATION.

    curl_setopt($curl, CURLOPT_FOLLOWLOCATION, true);
    

    The documentation states:

    TRUE to follow any "Location: " header that the server sends as part of the HTTP header (note this is recursive, PHP will follow as many "Location: " headers that it is sent, unless CURLOPT_MAXREDIRS is set).

    Therefore it would be advisable to set CURLOPT_MAXREDIRS aswell, for example to limit the execution to 1 redirection:

    curl_setopt($curl, CURLOPT_MAXREDIRS, 1);
    

    Like this you should be automatically be redirected to the original url without any further programming.

    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论
查看更多回答(1条)

报告相同问题?

悬赏问题

  • ¥15 服务端控制goose报文控制块的发布问题
  • ¥15 学习指导与未来导向啊
  • ¥15 求多普勒频移瞬时表达式
  • ¥15 如果要做一个老年人平板有哪些需求
  • ¥15 k8s生产配置推荐配置及部署方案
  • ¥15 matlab提取运动物体的坐标
  • ¥15 人大金仓下载,有人知道怎么解决吗
  • ¥15 一个小问题,本人刚入门,哪位可以help
  • ¥30 python安卓开发
  • ¥15 使用R语言GD包一直不出结果