duanlie3187 2017-11-08 08:13
浏览 31
已采纳

存储DOM元素以用作网站的新闻部分

I have been able to use the file_get_contents to go through a websites news section and grab the title text from each article. How would I then store that information and use it in a section on my website?

my php:

<?php
$html = file_get_contents("https://www.coindesk.com/category/news/");

$dom = new DomDocument();
$internalErrors = libxml_use_internal_errors(true);
$dom->loadHTML($html);
libxml_use_internal_errors($internalErrors);
$finder = new DomXPath($dom);
$classname="fade";
$nodes = $finder->query("//*[contains(@class, '$classname')]");
foreach ($nodes as $node) {
    echo $node->nodeValue."<br>"; 
} 
?>

where I want to store it:

<div id="box5" class="toggle" style="display: none;">
        <div id="services" class="services">
                <div class="container" >
                    <div class="service-head text-center">
                        <h2>NEWS</h2>
                        <span> </span>

                    </div>
                <button class="accordion">STORE THE POST TITLE HERE</button>
                <div class="panel1">
                  <p>STORE THE POST SUMMARY HERE WITH LINKS TO ARTICLE</p>
                </div>

                <button class="accordion">Section 2</button>
                <div class="panel1">
                  <p></p>
                </div>

                <button class="accordion">Section 3</button>
                <div class="panel1">
                  <p></p>
                </div>
          </div>
        </div>
      </div>
  • 写回答

1条回答 默认 最新

  • douxin2003 2017-11-08 08:41
    关注

    Fairly straightforward to do - once the XPath expressions have matched the content you store the node contents into an array or object which can be used later in the same page, saved to db or added to a session to use on a-n-other page.

    /* source url */
    $url='https://www.coindesk.com/category/news/';
    
    /* store results in this array */
    $output=array();
    
    /* XPath expressions */
    $exp=new stdClass;
    $exp->articles='//div[@id="content"]/div[ contains(@class,"article") ]/div[@class="post-info"]';
    $exp->title='h3/a';
    $exp->description='p[@class="desc"]';
    
    /* Load the source url directly into DOMDocument */
    $dom=new DOMDocument;
    $dom->validateOnParse=false;
    $dom->standalone=true;
    $dom->preserveWhiteSpace=true;
    $dom->strictErrorChecking=false;
    $dom->substituteEntities=false;
    $dom->recover=true;
    $dom->formatOutput=true;
    $dom->loadHTMLFile( $url );
    libxml_clear_errors();
    
    /* Query the DOM and process nodes found */
    $xp=new DOMXPath( $dom );
    $col=$xp->query( $exp->articles );
    
    if( !empty( $col ) && $col->length > 0 ){
        foreach( $col as $node ){
            $output[]=(object)array(
                'title'         =>  $xp->query($exp->title,$node)->item(0)->nodeValue,
                'description'   =>  $xp->query($exp->description,$node)->item(0)->nodeValue
            );
        }
    }
    $dom = $xp = $col = $node = null;
    
    
    /* 
        The contents of the scrape are stored in the $output array
        and can be used whereever on the page you wish - or stored
        as a session variable and used elsewhere etc etc
    */
    if( !empty( $output ) ){
        /*
            removed `display:none` from div below.....
        */
        echo "
        <div id='box5' class='toggle'>
            <div id='services' class='services'>
                <div class='container' >
                    <div class='service-head text-center'>
                        <h2>NEWS</h2>
                        <span> </span>
                    </div>";
    
        /* iterate through output array where each member is an object */
        foreach( $output as $i => $obj ){
            echo "
                    <button class='accordion'>{$obj->title}</button>
                    <div class='panel1'>
                        <p>{$obj->description}</p>
                    </div>";
        }
    
        echo "
                </div>
            </div>
        </div>";
    }
    
    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论

报告相同问题?

悬赏问题

  • ¥15 如何在scanpy上做差异基因和通路富集?
  • ¥20 关于#硬件工程#的问题,请各位专家解答!
  • ¥15 关于#matlab#的问题:期望的系统闭环传递函数为G(s)=wn^2/s^2+2¢wn+wn^2阻尼系数¢=0.707,使系统具有较小的超调量
  • ¥15 FLUENT如何实现在堆积颗粒的上表面加载高斯热源
  • ¥30 截图中的mathematics程序转换成matlab
  • ¥15 动力学代码报错,维度不匹配
  • ¥15 Power query添加列问题
  • ¥50 Kubernetes&Fission&Eleasticsearch
  • ¥15 報錯:Person is not mapped,如何解決?
  • ¥15 c++头文件不能识别CDialog