doujiyun0041 2015-08-08 13:44
浏览 55
已采纳

Foreach爆炸修剪采用最后日期简单的dom解析器

I want to take only last date from the third div counting from text div, using foreach below is my code

It only shows the date 01.01.1970 i cant get the date and compare it to today dates ------------- Part of the page i want to crawler

<div class="courses_list">

                                    <article class="course" data-id="4376" data-datepairs="20150718-20150809" data-terms="8">
                                                    <div class="img">
                            <a href="http://ickosovo.com/?course=network-security-pentesting-2"><img src="http://ickosovo.com/wp-content/uploads/2015/07/network-a4-90x60.jpg"></a>
                        </div>

                        <div class="text">
                            <h3><a href="http://ickosovo.com/?course=network-security-pentesting-2">NETWORK SECURITY &amp; PENTESTING</a></h3>
                            <div class="excerpt"><p>In the last decade, wireless networks gained a substantial momentum. One of the most beneficial features of wireless networks is […]</p>

                                <div class="applied date_applied">
                                18 July 2015 

                                                                        -
                                    09 August 2015                                                                      </div>

                                <div class="applied date_applied">
                                    ICT Courses                                 </div>

                            </div>
                        </div>
                    </article>
                                            <article class="course" data-id="4378" data-datepairs="20150727-20150826" data-terms="38">
                                                    <div class="img">
                            <a href="http://ickosovo.com/?course=autocad-autocad-lt-2015-fundamentals-7"><img src="http://ickosovo.com/wp-content/uploads/2015/07/AutoCAD-2015-poster-90x60.png"></a>
                        </div>

                        <div class="text">
                            <h3><a href="http://ickosovo.com/?course=autocad-autocad-lt-2015-fundamentals-7">AUTOCAD / AUTOCAD LT 2015 FUNDAMENTALS</a></h3>
                            <div class="excerpt"><p>The AutoCAD / AutoCAD LT 2015 Fundamentals&nbsp;Training course is designed for new users of AutoCAD and for delegates who would […]</p>

                                <div class="applied date_applied">
                                27 July 2015 

                                                                        -
                                    26 August 2015                                                                      </div>

                                <div class="applied date_applied">
                                    Special Focus                                   </div>

                            </div>
                        </div>
                    </article>
                                            <article class="course" data-id="4439" data-datepairs="20150727-20150918" data-terms="8">
                                                    <div class="img">
                            <a href="http://ickosovo.com/?course=web-design-6"><img src="http://ickosovo.com/wp-content/uploads/2015/07/web-design_poster_july-2015-90x60.png"></a>
                        </div>

                        <div class="text">
                            <h3><a href="http://ickosovo.com/?course=web-design-6">WEB DESIGN</a></h3>
                            <div class="excerpt"><p>Many other training companies claim that creating a website is easy and can be done by anyone. While this may […]</p>

                                <div class="applied date_applied">
                                27 July 2015 

                                                                        -
                                    18 September 2015                                                                       </div>

                                <div class="applied date_applied">
                                    ICT Courses                                 </div>

                            </div>
                        </div>
                    </article>
                                            <article class="course" data-id="4441" data-datepairs="20150728-20150919" data-terms="8">
                                                    <div class="img">
                            <a href="http://ickosovo.com/?course=php5-web-application-3"><img src="http://ickosovo.com/wp-content/uploads/2015/07/php-poster_july-2015-90x60.png"></a>
                        </div>

                        <div class="text">
                            <h3><a href="http://ickosovo.com/?course=php5-web-application-3">PHP5 Web Application</a></h3>
                            <div class="excerpt"><p>Many other training companies claim that creating a Web application is easy and can be done by anyone. While this […]</p>

                                <div class="applied date_applied">
                                28 July 2015 

                                                                        -
                                    19 September 2015                                                                       </div>

                                <div class="applied date_applied">
                                    ICT Courses                                 </div>

                            </div>
                        </div>
                    </article>
                                </div>






include('simple_html_dom.php');

            $html1 = file_get_html($page1);
        $today = strtotime("today");
                $events_old     = array();
                $events_today   = array();
                $events_future  = array();
        foreach($html1->find('div.text h3') as $e) {

$link = $e->getElementsByTagName('a',0)->href;
            $date_array = explode("-", trim($e->next_sibling()->getElementsByTagName('applied.date_applied', 0)->plaintext));
            $originalDate = trim($date_array[1]);
            $dt = strtotime($originalDate);

     $c = array('title' => $e->plaintext, 'date' => date('d.m.Y', $dt), 'timestamp' => $dt, 'from' => 'ict', 'link' => $link);

                    if($today == $dt) {
                        array_push($events_today, $c);
                    } elseif($today > $dt) {
                        array_push($events_old, $c);
                    } else {
                        array_push($events_future, $c);
                    }

            }
  • 写回答

1条回答 默认 最新

  • douxun2023 2015-08-08 20:30
    关注

    After some edits, we found the true relevant structure of the HTML markup needing to be parsed:

    <h3><a >NETWORK SECURITY &amp; PENTESTING</a></h3>
    <div class="excerpt"><p>In the last decade, wireless networks gained a substantial momentum. One of the most beneficial features of wireless networks is […]</p>
        <div class="applied date_applied">
           18 July 2015 
            -
           09 August 2015
        </div>
        <div class="applied date_applied">
            ICT Courses
        </div>
    </div>
    

    It becomes clearer then that the applied date_applied element containing the two dates is a child of the first sibling of <h3>. You can then access it with next_sibling() together with children() and use the array index [1] to reference the correct child node.

    foreach ($html1->find('div.text h3') as $e) {
      // get the two dates as an array
      // The second child node of the first sibling to the <h3>
      $date_array = explode('-' , trim($e->next_sibling()->children()[1]->plaintext)); $originalDate = trim($date_array[1]);
    
      // Trim and convert them to dates:
      foreach ($date_array as &$d) {
         $d = strtotime(trim($d));
      }
    }
    

    Check your results (now all converted to timestamps):

    print_r($date_array);
    Array
    (
        [0] => 1437195600
        [1] => 1439096400
    )
    Array
    (
        [0] => 1437973200
        [1] => 1440565200
    )
    Array
    (
        [0] => 1437973200
        [1] => 1442552400
    )
    Array
    (
        [0] => 1438059600
        [1] => 1442638800
    )
    
    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论

报告相同问题?

悬赏问题

  • ¥15 #MATLAB仿真#车辆换道路径规划
  • ¥15 java 操作 elasticsearch 8.1 实现 索引的重建
  • ¥15 数据可视化Python
  • ¥15 要给毕业设计添加扫码登录的功能!!有偿
  • ¥15 kafka 分区副本增加会导致消息丢失或者不可用吗?
  • ¥15 微信公众号自制会员卡没有收款渠道啊
  • ¥100 Jenkins自动化部署—悬赏100元
  • ¥15 关于#python#的问题:求帮写python代码
  • ¥20 MATLAB画图图形出现上下震荡的线条
  • ¥15 关于#windows#的问题:怎么用WIN 11系统的电脑 克隆WIN NT3.51-4.0系统的硬盘