想用python中的scrapy框架抓取网页,但是需要先登录才能显示抓取内容,登录即为一个post操作,但是scrapy中直接通过spider模块的start_url中的url在调度器中生成request,如果需添加post参数是在调试器里添加吗,另外在哪里可以打开并编辑调试器代码? 求用过scrapy的高手解答?_
1条回答 默认 最新
- oyljerry 2015-02-11 08:02关注
class LoginSpider(BaseSpider): name = 'example.com' start_urls = ['http://www.example.com/users/login.php'] def parse(self, response): return [FormRequest.from_response(response, formdata={'username': 'john', 'password': 'secret'}, callback=self.after_login)] def after_login(self, response): # check login succeed before going on if "authentication failed" in response.body: self.log("Login failed", level=log.ERROR) return # continue scraping with authenticated session... else: return Request(url="http://www.example.com/tastypage/", callback=self.parse_tastypage)
解决 无用评论 打赏 举报