duan2428 2018-08-10 10:52
浏览 70
已采纳

PHP CURL - 当你只知道id时刮掉seo url

I want to use curl to scrape multiple pages of an online shop. The problem that i have is that the urls are seo friendly - or something like that - and they look like this:

https://shopname.com/product-id-title-of-a-product.html

If i use the entire url it works and i'm able to get the data that i'm looking for but the only variable in that title that i know is the ID:

https://shopname.com/product-294

Is there a way to scrape that url in this case?

The url that only has the ID in it does REDIRECT to the full url.

And this is the code that i'm using:

$curl = curl_init();
$url = 'https://shopname.com/product-294';

curl_setopt($curl, CURLOPT_URL, $url);
curl_setopt($curl, CURLOPT_RETURNTRANSFER, true);

$result = curl_exec($curl);
  • 写回答

2条回答 默认 最新

  • douya7282 2018-08-10 11:16
    关注

    Curl provides the option CURLOPT_FOLLOWLOCATION.

    curl_setopt($curl, CURLOPT_FOLLOWLOCATION, true);
    

    The documentation states:

    TRUE to follow any "Location: " header that the server sends as part of the HTTP header (note this is recursive, PHP will follow as many "Location: " headers that it is sent, unless CURLOPT_MAXREDIRS is set).

    Therefore it would be advisable to set CURLOPT_MAXREDIRS aswell, for example to limit the execution to 1 redirection:

    curl_setopt($curl, CURLOPT_MAXREDIRS, 1);
    

    Like this you should be automatically be redirected to the original url without any further programming.

    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论
查看更多回答(1条)

报告相同问题?

悬赏问题

  • ¥15 用PLC设计纸袋糊底机送料系统
  • ¥15 simulink仿真中dtc控制永磁同步电机如何控制开关频率
  • ¥15 用C语言输入方程怎么
  • ¥15 网站显示不安全连接问题
  • ¥15 github训练的模型参数无法下载
  • ¥15 51单片机显示器问题
  • ¥20 关于#qt#的问题:Qt代码的移植问题
  • ¥50 求图像处理的matlab方案
  • ¥50 winform中使用edge的Kiosk模式
  • ¥15 关于#python#的问题:功能监听网页