dongpian2559 2012-05-17 11:24
浏览 26
已采纳

php提取对谷歌图像搜索的图像结果的最佳猜测?

I have a requirement where i have to reverse lookup an image on google and extract the name printed on the "Best guess for this image:" title. No i did some modifications to an existing curl code on the net and came this far:

<?php

function fetch_google($terms="sample search",$numpages=1,$user_agent='Mozilla/5.0 (Windows NT 6.1; rv:8.0) Gecko/20100101 Firefox/8.0')  
{
    $searched="";
    for($i=0;$i<=$numpages;$i++)
    {
        $ch = curl_init();
        $url="http://www.google.com/searchbyimage?hl=en&image_url=".urlencode($terms);
        curl_setopt ($ch, CURLOPT_URL, $url);
        curl_setopt ($ch, CURLOPT_USERAGENT, $user_agent);
        curl_setopt ($ch, CURLOPT_HEADER, 0);
        curl_setopt ($ch, CURLOPT_FOLLOWLOCATION, 1);
        curl_setopt ($ch, CURLOPT_RETURNTRANSFER, 1);
        curl_setopt ($ch, CURLOPT_REFERER, 'http://www.google.com/');
        curl_setopt ($ch,CURLOPT_CONNECTTIMEOUT,120);
        curl_setopt ($ch,CURLOPT_TIMEOUT,120);
        curl_setopt ($ch,CURLOPT_MAXREDIRS,10);
        curl_setopt ($ch,CURLOPT_COOKIEFILE,"cookie.txt");
        curl_setopt ($ch,CURLOPT_COOKIEJAR,"cookie.txt");
        $searched=$searched.curl_exec ($ch);
        curl_close ($ch);
    }

    $xml = new DOMDocument();
    @$xml->loadHTML($searched);
    foreach($xml->getElementsByTagName('div') as $div)
    {
        if(strpos($div->nodeValue,"Best guess for this image:"))
            return $div->nodeValue;
    } 
}

$content = fetch_google("http://media.il.edmunds-media.com/aston-martin/as/03/de/aston-martin_front_03-de-as_1_276.jpg",1);
echo $content."<br>";

?>

but it gives me lots of text and i am not able to get the exact div for it. since the 'a' does not have a class attribute i had to do it this way.

Please help!

  • 写回答

2条回答 默认 最新

  • dongre9937 2012-05-17 11:49
    关注

    You could use preg_match instead.

    As you're getting the HTML back from CURL, you can then use Regex to match the text instead:

    function fetch_google($terms="sample search",$numpages=1,$user_agent='Mozilla/5.0 (Windows NT 6.1; rv:8.0) Gecko/20100101 Firefox/8.0')  
    {
        $searched="";
        for($i=0;$i<=$numpages;$i++)
        {
            $ch = curl_init();
            $url="http://www.google.com/searchbyimage?hl=en&image_url=".urlencode($terms);
            curl_setopt ($ch, CURLOPT_URL, $url);
            curl_setopt ($ch, CURLOPT_USERAGENT, $user_agent);
            curl_setopt ($ch, CURLOPT_HEADER, 0);
            curl_setopt ($ch, CURLOPT_FOLLOWLOCATION, 1);
            curl_setopt ($ch, CURLOPT_RETURNTRANSFER, 1);
            curl_setopt ($ch, CURLOPT_REFERER, 'http://www.google.com/');
            curl_setopt ($ch,CURLOPT_CONNECTTIMEOUT,120);
            curl_setopt ($ch,CURLOPT_TIMEOUT,120);
            curl_setopt ($ch,CURLOPT_MAXREDIRS,10);
            curl_setopt ($ch,CURLOPT_COOKIEFILE,"cookie.txt");
            curl_setopt ($ch,CURLOPT_COOKIEJAR,"cookie.txt");
            $searched=$searched.curl_exec ($ch);
            curl_close ($ch);
        }
    
        $matches = array();
        preg_match('/Best guess for this image:[^<]+<a[^>]+>([^<]+)/', $searched, $matches);
        return (count($matches) > 1 ? $matches[1] : false);
    }
    
    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论
查看更多回答(1条)

报告相同问题?

悬赏问题

  • ¥15 shape_predictor_68_face_landmarks.dat
  • ¥15 slam rangenet++配置
  • ¥15 对于相关问题的求解与代码
  • ¥15 ubuntu子系统密码忘记
  • ¥15 信号傅里叶变换在matlab上遇到的小问题请求帮助
  • ¥15 保护模式-系统加载-段寄存器
  • ¥15 电脑桌面设定一个区域禁止鼠标操作
  • ¥15 求NPF226060磁芯的详细资料
  • ¥15 使用R语言marginaleffects包进行边际效应图绘制
  • ¥20 usb设备兼容性问题