douzhang8840 2015-04-25 22:23
浏览 67
已采纳

查询谷歌搜索引擎?

I am trying to query the google search engine by date to get the first page results then process it. The query I am currently using returns results but not in the date range I set; if I copied the same query into google it works for the date but not from my PHP script. The script returns only current or normal results as if the date parameter was not set. part of the code snippet used is below. The query I am referring to is below as well as in the code snippet posted in the $url variable.

Query:https://www.google.com/search?q='.$Query.'&source=lnt&tbs=cdr%3A1%2'.$startDate.$EndDate.'&tbm=

$Query= $_POST['Query'];
$Query=str_replace(" ","+",$Query);
if ($_POST['Start_date']==''){
$startday='1';
$startmonth='11';
$startyear='2011';
}
if ($_POST['End_date']==''){
$endday='1';
$endmonth='11';
$endyear='2013';
}
$startDate='Ccd_min%3A'.$startmonth.'%2F'.$startday.'%2F'.$startyear.'.%2';
$EndDate='Ccd_max%3A'.$endmonth.'%2F'.$endday.'%2F'.$endyear.'';

if ($_POST['Query']!=''){
$url  = 'https://www.google.com/search?   
q='.$Query.'&source=lnt&tbs=cdr%3A1%2'.$startDate.$EndDate.'&tbm=';
echo $url .'<p>';
$html = file_get_html($url);
$searchresults=array();
$linkObjs = $html->find('h3.r a');
foreach ($linkObjs as $linkObj) {
$link   = trim($linkObj->href);

    // if it is not a direct link but url reference found inside it, then extract
    if (!preg_match('/^https?/', $link) && preg_match('/q=(.+)&amp;sa=/U', $link, $matches) && preg_match('/^https?/', $matches[1])) {
        $link = $matches[1];
    } else if (!preg_match('/^https?/', $link)) { // skip if it is not a valid link
        continue;
    }
    array_push($searchresults,$link);
}
  • 写回答

2条回答 默认 最新

  • duanbi3786 2015-04-25 22:50
    关注

    Google presents a different html structure to devices without JavaScript enabled (file_get_html($url)). Temporarily Disable JavaScript on chrome and inspect the page. This way you'll be sure to get the correct div id's, classes, etc to use on your script.


    Update based on your comments:

    Google doesn't allow searching by date range via direct url if JavaScript is disabled. Although, you can still use the daterange Google operator to find pages that are indexed by Googlebot within the date range specified. The dates submitted must be in the Julian date format and the fractions should be omitted for this operator to work properly.

    Example: daterange:2452671-2452671 lisbon
    

    The daterange operator requires at least one proper search term and can be combined with other operators.


    gregoriantojd()

    To convert a Gregorian date to Julian date you can use the php function gregoriantojd( int $month , int $day , int $year ), i.e.:

    $startDate = gregoriantojd(12, 28, 2011);
    //2455924
    
    $endDate = gregoriantojd(12, 28, 2014);
    //2457020
    

    Your search $url should look like this:

    $url = "https://www.google.pt/search?q=lisbon+daterange:2455924-2457020&btnG=Search&num=100&gbv=1"
    

    Final code:

    include_once("simple_html_dom.php");
    
    $startDate = gregoriantojd(12, 28, 2011); //2455924
    $endDate = gregoriantojd(12, 28, 2014); //2457020
    $nResults = "100";
    $Query= "lisbon";
    
    $url = "https://www.google.com/search?q=$Query+daterange:$startDate-$endDate&btnG=Search&num=$nResults&gbv=1";
    
    echo $url .'<p>';
    $html = file_get_html($url);
    $searchresults=array();
    $linkObjs = $html->find('h3.r a');
    foreach ($linkObjs as $linkObj) {
    $link   = trim($linkObj->href);
    
        // if it is not a direct link but url reference found inside it, then extract
        if (!preg_match('/^https?/', $link) && preg_match('/q=(.+)&amp;sa=/U', $link, $matches) && preg_match('/^https?/', $matches[1])) {
            $link = $matches[1];
        } else if (!preg_match('/^https?/', $link)) { // skip if it is not a valid link
            continue;
        }
        array_push($searchresults,$link);
    }
    print_r($searchresults);
    
    /*
    Array ( [0] => http://www.cnn.com/2014/01/25/travel/lisbon-coolest-city/ [1] => http://www.tripadvisor.com/Tourism-g189158-Lisbon_Lisbon_District_Central_Portugal-Vacations.html
    etc...
    */
    
    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论
查看更多回答(1条)

报告相同问题?

悬赏问题

  • ¥15 想问一下树莓派接上显示屏后出现如图所示画面,是什么问题导致的
  • ¥100 嵌入式系统基于PIC16F882和热敏电阻的数字温度计
  • ¥15 cmd cl 0x000007b
  • ¥20 BAPI_PR_CHANGE how to add account assignment information for service line
  • ¥500 火焰左右视图、视差(基于双目相机)
  • ¥100 set_link_state
  • ¥15 虚幻5 UE美术毛发渲染
  • ¥15 CVRP 图论 物流运输优化
  • ¥15 Tableau online 嵌入ppt失败
  • ¥100 支付宝网页转账系统不识别账号