dongshi2141 2016-06-24 03:29
浏览 667

(网络爬虫)如何从新闻网站获取新闻文章的文本

i m going to get the text from a news website that i have to get around 1k website content

the link is on below : http://www.dcfever.com/news/readnews.php?id=16727

this website post every latest news and the new url is formed in adding 1 in the id

readnews.php?id=16727

so ,next url will be

readnews.php?id=16728

the question is i would like to scrape the text from 16000 to 17000

how to implement in Java

Jsoup? or other web crawler?

thanks

  • 写回答

3条回答 默认 最新

  • dongyi5425 2016-06-24 05:30
    关注

    You've tagged it python too. Look at beautifulsoup here: https://www.crummy.com/software/BeautifulSoup/bs4/doc/

    评论

报告相同问题?

悬赏问题

  • ¥40 复杂的限制性的商函数处理
  • ¥15 程序不包含适用于入口点的静态Main方法
  • ¥15 素材场景中光线烘焙后灯光失效
  • ¥15 请教一下各位,为什么我这个没有实现模拟点击
  • ¥15 执行 virtuoso 命令后,界面没有,cadence 启动不起来
  • ¥50 comfyui下连接animatediff节点生成视频质量非常差的原因
  • ¥20 有关区间dp的问题求解
  • ¥15 多电路系统共用电源的串扰问题
  • ¥15 slam rangenet++配置
  • ¥15 有没有研究水声通信方面的帮我改俩matlab代码