dsfe167834 2011-12-15 09:01
浏览 110
已采纳

简单的Html DOM缓存

I'm using Simple HTML DOM to scrape (with permission) some websites. I basically scrape around 50 different websites with statistical data which is updated around four times a day.

As you can imagine it takes times to do the scraping and therefore I need to speed up the process by doing some caching.

My vision is:

DATA-PRESENTATION.php // where all the results are shown

SCRAPING.php // the code that makes the job

I want to set up a cron job on SCRAPING.PHP in a way it executes 4 times a day and save all the data in caché which then will be requested by DATA-PRESENTATION.PHP making the experience for the user way faster.

My question is how can I implement this caché thing? I'm very rookie at PHP, I've been reading tutorials but they are not very helpfull and there are just a few so I just couldn't really learn how to do it.

I know other solution might be implementing a database but I don't want to do that. Also, I've been reading about high end solutions like memcached, but the site is very simple and for personal use, so I don't need that kind of stuff.

Thanks!!

SCRAPING.PHP

<?php
include("simple_html_dom.php");

// Labour stats
$html7 = file_get_html('http://www.website1.html');
$web_title = $html7->find(".title h1");
$web_figure = $html7->find(".figures h2");

?>

DATA-PRESENTATION.PHP

 <div class="news-pitch">
 <h1>Webiste: <?php echo utf8_encode($web_title[0]->plaintext); ?></h1>
 <p>Unemployment rate: <?php echo utf8_encode($web_figure[0]->plaintext); ?></p>
 </div>

FINAL CODE! Many thanks @jerjer and @PaulD.Waite, I couldn't really get this done without your help!

Files:

1- DataPresentation.php // here I show the data requested to Cache.html

2- Scraping.php // here I scrape the sites and then save the results to Cache.html

3- Cache.html // here the scraping results are saved

I set up a Cron Job on Scraping.php telling it to overwrite Cache.html each time.

1- DataPresentation.php

<?php
include("simple_html_dom.php");

$html = file_get_html("cache/test.html");
$title = $html->find("h1");
echo $title[0]->plaintext;
?>

2- Scraping.php

<?php
include("simple_html_dom.php");

// by adding "->find("h1")" I speed up things as it only retrieves the information I'll be using and not the whole page.
$filename = "cache/test.html";
$content = file_get_html ('http://www.website.com/')->find("h1");
file_put_contents($filename, $content);
?>

3- Cache.html

<h1>Current unemployment 7,2%</h1>

It loads immediately and by setting things this way I assure there's always a Caché file to be loaded.

  • 写回答

2条回答 默认 最新

  • douwang4374 2011-12-15 09:07
    关注

    Here is a sample of a file-based caching:

    <?php
        // Labour stats
        $filename = "cache/website1.html";
        if(!file_exists($filename)){
            $content = file_get_contents('http://www.website1.html');
            file_put_contents($filename, $content);
        }
    
        $html7 = file_get_html($filename);
        $web_title = $html7->find(".title h1");
        $web_figure = $html7->find(".figures h2");
    
    ?>
    
    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论
查看更多回答(1条)

报告相同问题?

悬赏问题

  • ¥15 C++ yoloV5改写遇到的问题
  • ¥20 win11修改中文用户名路径
  • ¥15 win2012磁盘空间不足,c盘正常,d盘无法写入
  • ¥15 用土力学知识进行土坡稳定性分析与挡土墙设计
  • ¥70 PlayWright在Java上连接CDP关联本地Chrome启动失败,貌似是Windows端口转发问题
  • ¥15 帮我写一个c++工程
  • ¥30 Eclipse官网打不开,官网首页进不去,显示无法访问此页面,求解决方法
  • ¥15 关于smbclient 库的使用
  • ¥15 微信小程序协议怎么写
  • ¥15 c语言怎么用printf(“\b \b”)与getch()实现黑框里写入与删除?