合并DOM查询和file_get_contents

I have researched this quite a bit over the last few days, and I have found all the answers online for the various functions, so thank you.

I now have 3 separate bits of code that all grab the contents of a webpage (the page would be an e-commerce product page, review page, something with a product on it) to get different information, but I am assuming this is very inefficient grabbing the contents 3 times!

The 3 bits of code do the 3 following things: 1) Get the webpage Title 2) Get all the images from a page 3) Find figures to get (what is hopefully) the price of the item on that page.

I would appreciate some help to group these together so it only has to get the file contents once. This is my current code: 1st Time:

function getDetails($Url){
    $str = file_get_contents($Url);
    if(strlen($str)>0){
        //preg_match("/\<title\>(.*)\<\/title\>/",$str,$title);
        //The above didnt work well enough (for getting Title when <title id=... > etc) so used the DOM below



            preg_match("/(\£[0-9]+(\.[0-9]{2})?)/",$str,$price); //£ for GBP
            $priceRes = preg_replace("/[^0-9,.]/", "", $price[0]);

            //$pageDeatil[0]=$title;
            $pageDeatil[1]=$priceRes;
            return $pageDeatil;

    }
}

$pageDeatil = getDetails("$newItem_URL");
//$itemTitle = $pageDeatil[0];
$itemPrice = $pageDeatil[1];

2nd Time:

$doc = new DOMDocument();
@$doc->loadHTMLFile("$newItem_URL");
$xpath = new DOMXPath($doc);
$itemTitle = $xpath->query('//title')->item(0)->nodeValue."
";

3rd Time:

include('../../code/simplehtmldom/simple_html_dom.php');
include('../../code/url_to_absolute/url_to_absolute.php');

$html = file_get_html($newItem_URL);
foreach($html->find('img') as $e){

$imgURL =  url_to_absolute($url, $e->src);
    //More code here

}

I cant seem to get the file once then use just that throughout the rest. Any help would be appreciated! Thanks in advance.

写回答
好问题 0 提建议
追加酬金
关注问题
分享
邀请回答
编辑收藏删除结题
收藏举报

1条回答默认最新

dqu92800 2013-08-29 11:40

关注

I prefer using cURL when scraping sites. Your price fetching code doesn't seem to be particularly efficient either, I think you should use XPath there as well. The return of the function could be an object with price, title and an array of images.

function get_details($url) {
   $ch = curl_init($url);
   curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
   curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
   curl_setopt($ch, CURLOPT_USERAGENT, $_SERVER['HTTP_USER_AGENT']);

   $html = curl_exec($ch);

   $dom = new DOMDocument();
   @$dom->loadHTML($html);
   $xpath = new DOMXPath($dom);

   $product         = new stdClass;
   $product->title  = $xpath->query('//title')->item(0)->nodeValue;
   $product->price  = // price query goes here
   $product->images = array();

   foreach($xpath->query('//img') as $image) {
      $product->images[] = $image->getAttribute('src');
   }

   return $product;
}

报告相同问题？

关注问题

file_get_contents不适用于某些域 php
2015-01-27 11:44

回答 1 已采纳 Many sites, not only parked domains use some mechanism to block basic requests without valid brows
警告：file_get_contents（）要求参数1为有效路径，给定数组为16 php
2014-04-05 17:39

回答 1 已采纳 I don't know exactly why are you saving the urls in an array since seems you want to simply open t
file_get_contents无法在不同的服务器上运行 php
2014-04-07 21:47

回答 1 已采纳 403 is an Unauthorized Error. That means you lack sufficient permission to connect to the content
PHP file_get_contents函数详解
2018-09-01 22:07

探探帽的博客一. file_get_contents(path,include_path,context,start,max_length) 参数描述 path 必需。规定要读取的文件。 include_path 可选。如果也想在 include_path ...
PHP file_get_contents只返回换行符 php
2012-10-07 11:46

回答 2 已采纳 The problem is not in PHP but in target host. It detects client's User-Aget header. Look at this:
将javascript添加到php中的file_get_contents加载的html文件中 javascript php
2015-01-11 11:13

回答 1 已采纳 <?php $page = file_get_contents('http://example.com'); $doc = new DOMDocument(); $doc->load
尝试读取网页时，file_get_contents偶尔会返回空字符串 php
2011-08-09 18:17

回答 1 已采纳 Look at $http_response_header. You can check the status code of the request. If it's anything othe
php warning: file_get_contents,Ecshop报警告：Warning: file_get_contents
2021-04-15 15:06

所遇非人覃的博客今天在工作的时候，小编的一个php免备案虚拟主机客户找到小编，说他网站有问题...如下：Warning: file_get_contents(/home/pkli9i9h3f/domains/india-oil.cn/public_html/mobile/templates/foot.html): failed to o...
脚本PHP出错：file_get_contents（）：第76行/www/public_html/simple_html_dom.php中的文件名不能为空[关闭] php
2017-03-26 16:42

回答 1 已采纳 I solved this problem.At the end of the file plik.txt i hade a empty line. Now all its work
致命错误：无法使用mysql，php，curl函数重新声明file_get_contents_curl（）。请[关闭] mysql php
2015-12-25 21:53

回答 1 已采纳 The error is very simple. You declare your function insight your while loop. Every time you iterat
将变量传递给file_get_html在Simple DOM中不起作用 php
2013-12-02 14:41

回答 1 已采纳 In your code the final contents of $b contain single quotes which aren't necessary if you're handl
php file_get_contents 错误,使用file_get_contents进行良好的错误处理
2021-04-20 04:26

张珍惜的博客我正在使用具有以下功能的simplehtmldom：// get html dom form filefunction file_get_html() {$dom = new simple_html_dom;...load(call_user_func_array('file_get_contents', $args), true);re...
获取搜索关键字的div（file_get_contents（'url'） html php
2010-09-09 11:52

回答 4 已采纳 I solved the problem with: $doc = new DOMDocument(); $doc->loadHTML($str); $xPath
php file_get网址,php – file_get_contents( – 修复相对网址
2021-04-21 05:58

RomanGol Liarod的博客我试图向用户显示一个网站,使用php下载它.这是我正在使用的脚本：$url = '...//Download page$site = file_get_contents($url);//Fix relative URLs$site = str_replace('src="','src="' . $url,$sit...
PHP 使用 file_get_contents 接收 POST 的資料
2018-06-13 09:38

GodFu1012的博客一般接收 POST 资料都是使用 $_POST 这个变量，但 $_POST 只能取得 Content-type 為 application/x-www-form-urlencoded 或 multipart/form-data 的資料。...若有接收其他類型 Content-type 的需求，可以使用 php:...
没有解决我的问题, 去提问

悬赏问题

¥15 2020长安杯与连接网探
¥15 关于#matlab#的问题：在模糊控制器中选出线路信息，在simulink中根据线路信息生成速度时间目标曲线（初速度为20m/s，15秒后减为0的速度时间图像）我想问线路信息是什么
¥15 banner广告展示设置多少时间不怎么会消耗用户价值
¥16 mybatis的代理对象无法通过@Autowired装填
¥15 可见光定位matlab仿真
¥15 arduino 四自由度机械臂
¥15 wordpress 产品图片 GIF 没法显示
¥15 求三国群英传pl国战时间的修改方法
¥15 matlab代码代写，需写出详细代码，代价私
¥15 ROS系统搭建请教（跨境电商用途）

码龄粉丝数原力等级 --

合并DOM查询和file_get_contents

1条回答默认最新

码龄粉丝数原力等级 --

悬赏问题

合并DOM查询和file_get_contents

1条回答 默认 最新

悬赏问题

1条回答默认最新