doushantun0614 2013-02-18 18:16
浏览 42

too long

So lets say i have a google news feed, like this: https://news.google.com/news/feeds?pz=1&cf=all&ned=no_no&hl=no&q=%22something%22&output=atom&num=1

Grabbing the title, author and link would be easy, but how would i go around getting say the first 200 characters of the content? its full of html, and mixed in with the title and author aswell.

i could use strip_tags on it, but it would still be a mess.

Any way to make google return a ['description'] maybe?

or is there perhaps any other good news feeds that gives me the content in a way thats easier to manage?

[edit]

Update on how i ended up doing it.

$news = @simplexml_load_string(file_get_contents('https://news.google.com/news/feeds?pz=1&cf=all&ned=no_no&hl=no&q=%22molde+fotballklubb%22+OR+%22tornekrattet%22+OR+%22mfk%22+OR+%22oddmund+bjerkeset%22+-%22moss%22&output=atom&num=1'),  'SimpleXMLElement', LIBXML_NOCDATA);

        $data = get_object_vars($news->{'entry'});
        $test = explode('<font size="-1">', $data['content']);
        $link = get_object_vars($data['link']);

        $return['title']        = strip_tags($test[0]);
        $return['author']       = strip_tags($test[1]);
        $return['description'] = strip_tags($test[2]);
        $return['link']         = $link['@attributes']['href'];

It is still not working properly, but thats because the feed gives me the content in different ways all the time. Sometimes the content of the news article itself will just be metadata like the authors and image descriptions.

And the breaking it up by html tags when the html have changes from time to time causes some problems. But i cant figure out any othe way of doing it with this feed.

  • 写回答

1条回答 默认 最新

  • duanchui1279 2013-02-18 18:41
    关注

    You could try loading the HTML in a DOMDocument instance and extract the parts you need, or use a wrapper for it like Goutte which makes it a lot easier to extract portions you need.

    http://php.net/manual/en/class.domdocument.php

    https://github.com/fabpot/Goutte

    评论

报告相同问题?

悬赏问题

  • ¥15 基于卷积神经网络的声纹识别
  • ¥15 Python中的request,如何使用ssr节点,通过代理requests网页。本人在泰国,需要用大陆ip才能玩网页游戏,合法合规。
  • ¥100 为什么这个恒流源电路不能恒流?
  • ¥15 有偿求跨组件数据流路径图
  • ¥15 写一个方法checkPerson,入参实体类Person,出参布尔值
  • ¥15 我想咨询一下路面纹理三维点云数据处理的一些问题,上传的坐标文件里是怎么对无序点进行编号的,以及xy坐标在处理的时候是进行整体模型分片处理的吗
  • ¥15 CSAPPattacklab
  • ¥15 一直显示正在等待HID—ISP
  • ¥15 Python turtle 画图
  • ¥15 stm32开发clion时遇到的编译问题