douzhang8840 2015-04-25 22:23
浏览 67
已采纳

查询谷歌搜索引擎?

I am trying to query the google search engine by date to get the first page results then process it. The query I am currently using returns results but not in the date range I set; if I copied the same query into google it works for the date but not from my PHP script. The script returns only current or normal results as if the date parameter was not set. part of the code snippet used is below. The query I am referring to is below as well as in the code snippet posted in the $url variable.

Query:https://www.google.com/search?q='.$Query.'&source=lnt&tbs=cdr%3A1%2'.$startDate.$EndDate.'&tbm=

$Query= $_POST['Query'];
$Query=str_replace(" ","+",$Query);
if ($_POST['Start_date']==''){
$startday='1';
$startmonth='11';
$startyear='2011';
}
if ($_POST['End_date']==''){
$endday='1';
$endmonth='11';
$endyear='2013';
}
$startDate='Ccd_min%3A'.$startmonth.'%2F'.$startday.'%2F'.$startyear.'.%2';
$EndDate='Ccd_max%3A'.$endmonth.'%2F'.$endday.'%2F'.$endyear.'';

if ($_POST['Query']!=''){
$url  = 'https://www.google.com/search?   
q='.$Query.'&source=lnt&tbs=cdr%3A1%2'.$startDate.$EndDate.'&tbm=';
echo $url .'<p>';
$html = file_get_html($url);
$searchresults=array();
$linkObjs = $html->find('h3.r a');
foreach ($linkObjs as $linkObj) {
$link   = trim($linkObj->href);

    // if it is not a direct link but url reference found inside it, then extract
    if (!preg_match('/^https?/', $link) && preg_match('/q=(.+)&amp;sa=/U', $link, $matches) && preg_match('/^https?/', $matches[1])) {
        $link = $matches[1];
    } else if (!preg_match('/^https?/', $link)) { // skip if it is not a valid link
        continue;
    }
    array_push($searchresults,$link);
}
  • 写回答

2条回答 默认 最新

  • duanbi3786 2015-04-25 22:50
    关注

    Google presents a different html structure to devices without JavaScript enabled (file_get_html($url)). Temporarily Disable JavaScript on chrome and inspect the page. This way you'll be sure to get the correct div id's, classes, etc to use on your script.


    Update based on your comments:

    Google doesn't allow searching by date range via direct url if JavaScript is disabled. Although, you can still use the daterange Google operator to find pages that are indexed by Googlebot within the date range specified. The dates submitted must be in the Julian date format and the fractions should be omitted for this operator to work properly.

    Example: daterange:2452671-2452671 lisbon
    

    The daterange operator requires at least one proper search term and can be combined with other operators.


    gregoriantojd()

    To convert a Gregorian date to Julian date you can use the php function gregoriantojd( int $month , int $day , int $year ), i.e.:

    $startDate = gregoriantojd(12, 28, 2011);
    //2455924
    
    $endDate = gregoriantojd(12, 28, 2014);
    //2457020
    

    Your search $url should look like this:

    $url = "https://www.google.pt/search?q=lisbon+daterange:2455924-2457020&btnG=Search&num=100&gbv=1"
    

    Final code:

    include_once("simple_html_dom.php");
    
    $startDate = gregoriantojd(12, 28, 2011); //2455924
    $endDate = gregoriantojd(12, 28, 2014); //2457020
    $nResults = "100";
    $Query= "lisbon";
    
    $url = "https://www.google.com/search?q=$Query+daterange:$startDate-$endDate&btnG=Search&num=$nResults&gbv=1";
    
    echo $url .'<p>';
    $html = file_get_html($url);
    $searchresults=array();
    $linkObjs = $html->find('h3.r a');
    foreach ($linkObjs as $linkObj) {
    $link   = trim($linkObj->href);
    
        // if it is not a direct link but url reference found inside it, then extract
        if (!preg_match('/^https?/', $link) && preg_match('/q=(.+)&amp;sa=/U', $link, $matches) && preg_match('/^https?/', $matches[1])) {
            $link = $matches[1];
        } else if (!preg_match('/^https?/', $link)) { // skip if it is not a valid link
            continue;
        }
        array_push($searchresults,$link);
    }
    print_r($searchresults);
    
    /*
    Array ( [0] => http://www.cnn.com/2014/01/25/travel/lisbon-coolest-city/ [1] => http://www.tripadvisor.com/Tourism-g189158-Lisbon_Lisbon_District_Central_Portugal-Vacations.html
    etc...
    */
    
    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论
查看更多回答(1条)

报告相同问题?

悬赏问题

  • ¥15 curl 命令调用正常,程序调用报 java.net.ConnectException: connection refused
  • ¥20 关于web前端如何播放二次加密m3u8视频的问题
  • ¥15 使用百度地图api 位置函数报错?
  • ¥15 metamask如何添加TRON自定义网络
  • ¥66 关于川崎机器人调速问题
  • ¥15 winFrom界面无法打开
  • ¥30 crossover21 ARM64版本安装软件问题
  • ¥15 mymetaobjecthandler没有进入
  • ¥15 mmo能不能做客户端怪物
  • ¥15 osm下载到arcgis出错