dream752614590 2016-12-14 11:02
浏览 154
已采纳

从外部页面链接获取“标题”和“描述”

I am trying to get title, description from external page link source. This is not working when I am trying to get Facebook page source and is returning source code of some another page. It is working on other websites like google etc. Here is my code in PHP :

$ch = curl_init();
   curl_setopt($ch, CURLOPT_HEADER, 0);
   curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
   curl_setopt($ch, CURLOPT_URL, $url);
   curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);
   $data = curl_exec($ch);
   curl_close($ch);
   return $data;
}

public function previewLink(){
   $url = "https://www.facebook.com/NASA/";
   $html = $this->file_get_contents_curl($url);
   $title = "";
   $description ="";
   $image = "";

   //parsing begins here:
   $doc = new \DOMDocument();
   @$doc->loadHTML($html);
   $nodes = $doc->getElementsByTagName('title');
   $title = $nodes->item(0)->nodeValue();
  }

I am not getting what is the problem I am facing. Can someone suggest something ? Thanks in advance.

  • 写回答

1条回答 默认 最新

  • dongshi2836 2016-12-14 11:45
    关注

    Facebook requires UserAgent string in http request. You can add that by using this

    curl_setopt($ch, CURLOPT_HTTPHEADER, array('User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_4) AppleWebKit/600.7.12 (KHTML, like Gecko) Version/8.0.7 Safari/600.7.12'));
    

    FYI: facebook uses to display captcha page when anyone goes to a page without login.

    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论

报告相同问题?

悬赏问题

  • ¥15 使用EMD去噪处理RML2016数据集时候的原理
  • ¥15 神经网络预测均方误差很小 但是图像上看着差别太大
  • ¥15 Oracle中如何从clob类型截取特定字符串后面的字符
  • ¥15 想通过pywinauto自动电机应用程序按钮,但是找不到应用程序按钮信息
  • ¥15 如何在炒股软件中,爬到我想看的日k线
  • ¥15 seatunnel 怎么配置Elasticsearch
  • ¥15 PSCAD安装问题 ERROR: Visual Studio 2013, 2015, 2017 or 2019 is not found in the system.
  • ¥15 (标签-MATLAB|关键词-多址)
  • ¥15 关于#MATLAB#的问题,如何解决?(相关搜索:信噪比,系统容量)
  • ¥500 52810做蓝牙接受端