douweng1935 2013-07-18 20:29
浏览 50
已采纳

PHP - 如何在Firefox中获取像Reader Mode这样的主要HTML内容

in android Firefox app and safari iPad we can read only main content by "Reader Mode". read more... How to recognize only main content in HTML with PHP?

I need to detect main news like Firefox or safari by php

for example I get news from bbcsite.com/news/123 by this code:

<?php
    $html = file_get_contents('http://bbcsite.com/news/123');
?>

then show only main news without ads and ... like Firefox and safari.

I find fivefilters.org . this site can get content!!!

thank you

  • 写回答

5条回答 默认 最新

  • douyun8674 2013-07-18 22:48
    关注

    Hooray!!!

    I found this source code:

    1) create Readability.php

    2) create JSLikeHTMLElement.php

    3) create index.php by this code:

    <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
    <html>
        <head>
            <title>!</title>
            <meta http-equiv="Content-Type" content="text/html; charset=UTF-8"/>
        </head>
    <body dir="rtl">
    <?php
    include_once 'Readability.php';
    
    
    // get latest Medialens alert 
    // (change this URL to whatever you'd like to test)
    $url = 'http://';
    $html = file_get_contents($url);
    
    // Note: PHP Readability expects UTF-8 encoded content.
    // If your content is not UTF-8 encoded, convert it 
    // first before passing it to PHP Readability. 
    // Both iconv() and mb_convert_encoding() can do this.
    
    // If we've got Tidy, let's clean up input.
    // This step is highly recommended - PHP's default HTML parser
    // often doesn't do a great job and results in strange output.
    if (function_exists('tidy_parse_string')) {
        $tidy = tidy_parse_string($html, array(), 'UTF8');
        $tidy->cleanRepair();
        $html = $tidy->value;
    }
    
    // give it to Readability
    $readability = new Readability($html, $url);
    // print debug output? 
    // useful to compare against Arc90's original JS version - 
    // simply click the bookmarklet with FireBug's console window open
    $readability->debug = false;
    // convert links to footnotes?
    $readability->convertLinksToFootnotes = true;
    // process it
    $result = $readability->init();
    // does it look like we found what we wanted?
    if ($result) {
        echo "== Title =====================================
    ";
        echo $readability->getTitle()->textContent, "
    
    ";
        echo "== Body ======================================
    ";
        $content = $readability->getContent()->innerHTML;
        // if we've got Tidy, let's clean it up for output
        if (function_exists('tidy_parse_string')) {
            $tidy = tidy_parse_string($content, array('indent'=>true, 'show-body-only' => true), 'UTF8');
            $tidy->cleanRepair();
            $content = $tidy->value;
        }
        echo $content;
    } else {
        echo 'Looks like we couldn\'t find the content. :(';
    }
    ?>
    </body>
    </html>
    

    in $url = 'http://'; set your site url.

    Thank you;)

    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论
查看更多回答(4条)

报告相同问题?

悬赏问题

  • ¥15 如何用Labview在myRIO上做LCD显示?(语言-开发语言)
  • ¥15 Vue3地图和异步函数使用
  • ¥15 C++ yoloV5改写遇到的问题
  • ¥20 win11修改中文用户名路径
  • ¥15 win2012磁盘空间不足,c盘正常,d盘无法写入
  • ¥15 用土力学知识进行土坡稳定性分析与挡土墙设计
  • ¥70 PlayWright在Java上连接CDP关联本地Chrome启动失败,貌似是Windows端口转发问题
  • ¥15 帮我写一个c++工程
  • ¥30 Eclipse官网打不开,官网首页进不去,显示无法访问此页面,求解决方法
  • ¥15 关于smbclient 库的使用