douxin8610 2018-09-26 20:09
浏览 55
已采纳

从URL加载下一组结果 - PHP cURL

Looking for some help, i am using curl to extract data from said website, on the site they have 10 results on the first page and then the next set of 10 results are on the next with ?page=2 appended and so on.

I did try a loop but it didnt seem to work, any suggestion i could work with, preferably a scroll to load more but want to get he curl part correct first.

Below is the test code i am using as example, the full version includes post parameter appended to the URL but just need the next results

<?php

// Main url but the next result will be on https://example.org/data/?page=2
$url = "https://example.org/data";

$result = get($url) ;

function get ($url) {
    $curl = curl_init();
    curl_setopt($curl, CURLOPT_URL, $url);
    curl_setopt($curl, CURLOPT_RETURNTRANSFER, true);
    curl_setopt($curl, CURLOPT_USERAGENT, 'Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/40.0.2214.85 Safari/537.36');
    $result = curl_exec($curl);
    curl_close($curl);
    return $result;
}

preg_match_all('!<h1>(.*?)<\/h1>!',$result,$title);

for ($i = 0; $i < count($result[1]); $i++) {
    echo '<h1>' . $title[1][$i] . '"</h1>';
}

To all that is reading this for learning as i did, the code above works also for basic extraction of the H1 header on any given URL once the values match, if i can help with any basic questions for new coders i will.

Modified example showing a page 1,2 example in the URL.

<?php

for ($i = 1; $i <= 2; $i++) {
$url = "https://www.gamespot.com/search/?q=gta&page=". $i;
echo $url . "<br>";
}

$result = get($url) ;

function get ($url) {
$curl = curl_init();
curl_setopt($curl, CURLOPT_URL, $url);
curl_setopt($curl, CURLOPT_RETURNTRANSFER, true);
curl_setopt($curl, CURLOPT_USERAGENT, 'Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/40.0.2214.85 Safari/537.36');
$result = curl_exec($curl);
curl_close($curl);
return $result;
}

preg_match_all('!<h4 class="media-title" style="margin:0;padding-bottom:4px;">
                            <span style="font-weight:bold;"><a href=".*?">(.*?)<\/a><\/span>
          <\/h4>!',$result,$title);

for ($i = 0; $i < count($title[1]); $i++) {
echo '<p>' . $title[1][$i] . '</p>';
}
  • 写回答

1条回答 默认 最新

  • dtvpe4837413 2018-10-07 10:23
    关注

    Ok so after many hours of research and failures i have done the following which works as i wanted to so i wish to share this with you.

    I set some variables like with the values below

    // Get the value of $pg through the GET value of 'page'
    $pg = $_GET['page'];
    
    // Increase the $pg variable when clicking $next/$prev with +1 or -1
    $next = $pg +1;
    $prev = $pg -1;
    
    // Append the $pg value to the CURL url
    $url = "https://www.gamespot.com/search/?q=gta&page=".$pg;
    
    // The next & previous
    <?php
      echo '<div class="btn-group special">';
       // Added an IF statement so that this does not go to the -1 values
       if ($prev >= 1) {
        echo '<a href="results.php?page='.$prev.'" class="btn btn-info" role="button"><i class="fas fa-chevron-left"></i></a>';
       } else {
        // Nothing to display
       }
      echo '<a href="results.php?page='.$next.'" class="btn btn-info" role="button"><i class="fas fa-chevron-right"></i></a>';
      echo '</div>'
    ?>
    
    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论

报告相同问题?

悬赏问题

  • ¥15 各位请问平行检验趋势图这样要怎么调整?说标准差差异太大了
  • ¥15 delphi webbrowser组件网页下拉菜单自动选择问题
  • ¥15 wpf界面一直接收PLC给过来的信号,导致UI界面操作起来会卡顿
  • ¥15 init i2c:2 freq:100000[MAIXPY]: find ov2640[MAIXPY]: find ov sensor是main文件哪里有问题吗
  • ¥15 运动想象脑电信号数据集.vhdr
  • ¥15 三因素重复测量数据R语句编写,不存在交互作用
  • ¥15 微信会员卡等级和折扣规则
  • ¥15 微信公众平台自制会员卡可以通过收款码收款码收款进行自动积分吗
  • ¥15 随身WiFi网络灯亮但是没有网络,如何解决?
  • ¥15 gdf格式的脑电数据如何处理matlab