1mportttttt 2017-11-21 01:46 采纳率: 0%
浏览 1622

java webmagic 爬取知乎回答

用webmagic抓取知乎某个问题下的所有回答时候,每次只能获取前两条回答。

查了各种博客,试了各种方法,总是只返回2条回答,或者直接401。

o.a.h.impl.execchain.MainClientExec - Connection can be kept alive indefinitely
o.a.http.impl.auth.HttpAuthenticator - Authentication required
o.a.http.impl.auth.HttpAuthenticator - www.zhihu.com:443 requested authentication
o.a.http.impl.auth.HttpAuthenticator - Response contains no authentication challenges
o.a.h.c.p.ResponseProcessCookies - Cookie accepted [aliyungf_tc="AQAAAD1PxXQABgUA7CesO3+7/0/iFhJt", version:0, domain:www.zhihu.com, path:/, expiry:null]
o.a.h.i.c.PoolingHttpClientConnectionManager - Connection [id: 0][route: {s}->https://www.zhihu.com:443] can be kept alive indefinitely
o.a.h.i.c.PoolingHttpClientConnectionManager - Connection released: [id: 0][route: {s}->https://www.zhihu.com:443][total kept alive: 1; route allocated: 1 of 100; total allocated: 1 of 1]
u.c.webmagic.utils.CharsetUtils - Auto get charset: null
u.c.w.d.HttpClientDownloader - Charset autodetect failed, use UTF-8 as charset. Please specify charset in Site.setCharset()
u.c.w.d.HttpClientDownloader - downloading page success https://www.zhihu.com/api/v4/questions/29688243/answers?sort_by=default&include=data%5B%2A%5D.is_normal%2Cadmin_closed_comment%2Creward_info%2Cis_collapsed%2Cannotation_action%2Cannotation_detail%2Ccollapse_reason%2Cis_sticky%2Ccollapsed_by%2Csuggest_edit%2Ccomment_count%2Ccan_comment%2Ccontent%2Ceditable_content%2Cvoteup_count%2Creshipment_settings%2Ccomment_permission%2Ccreated_time%2Cupdated_time%2Creview_info%2Cquestion%2Cexcerpt%2Crelationship.is_authorized%2Cis_author%2Cvoting%2Cis_thanked%2Cis_nothelp%2Cupvoted_followees%3Bdata%5B%2A%5D.mark_infos%5B%2A%5D.url%3Bdata%5B%2A%5D.author.follower_count%2Cbadge%5B%3F%28type%3Dbest_answerer%29%5D.topics&limit=3&offset=3
09:04:14.908 [pool-1-thread-1] INFO us.codecraft.webmagic.Spider - page status code error, page https://www.zhihu.com/api/v4/questions/29688243/answers?sort_by=default&include=data%5B%2A%5D.is_normal%2Cadmin_closed_comment%2Creward_info%2Cis_collapsed%2Cannotation_action%2Cannotation_detail%2Ccollapse_reason%2Cis_sticky%2Ccollapsed_by%2Csuggest_edit%2Ccomment_count%2Ccan_comment%2Ccontent%2Ceditable_content%2Cvoteup_count%2Creshipment_settings%2Ccomment_permission%2Ccreated_time%2Cupdated_time%2Creview_info%2Cquestion%2Cexcerpt%2Crelationship.is_authorized%2Cis_author%2Cvoting%2Cis_thanked%2Cis_nothelp%2Cupvoted_followees%3Bdata%5B%2A%5D.mark_infos%5B%2A%5D.url%3Bdata%5B%2A%5D.author.follower_count%2Cbadge%5B%3F%28type%3Dbest_answerer%29%5D.topics&limit=3&offset=3 , code: 401

求各路大神指点迷津

  • 写回答

2条回答

  • 关注

    老哥你这是在搞爬虫吗

    评论

报告相同问题?

悬赏问题

  • ¥15 phython路径名过长报错 不知道什么问题
  • ¥15 深度学习中模型转换该怎么实现
  • ¥15 HLs设计手写数字识别程序编译通不过
  • ¥15 Stata外部命令安装问题求帮助!
  • ¥15 从键盘随机输入A-H中的一串字符串,用七段数码管方法进行绘制。提交代码及运行截图。
  • ¥15 TYPCE母转母,插入认方向
  • ¥15 如何用python向钉钉机器人发送可以放大的图片?
  • ¥15 matlab(相关搜索:紧聚焦)
  • ¥15 基于51单片机的厨房煤气泄露检测报警系统设计
  • ¥15 Arduino无法同时连接多个hx711模块,如何解决?