qq_25040209 2017-04-26 01:51 采纳率: 0%
浏览 2135

crawlSpider爬虫无法跟进rule中的链接

以下是代码,发现response.url一直是“http://book.douban.com/top250”,没有继续跟进去,求大神帮忙解决 不胜感激

books.py

!/usr/bin/pyhon

-*- coding: utf-8 -*-

coding=utf-8

from scrapy.contrib.spiders import CrawlSpider, Rule
from scrapy.selector import Selector
from scrapy.contrib.linkextractors import LinkExtractor
from douban.items import DoubanItem

class BooksSpider(CrawlSpider):
name = "BooksSpider"
allowed_domains = ["book.douban.com"]
start_urls = [
"http://book.douban.com/top250"
]

rules = (
    Rule(LinkExtractor(allow=
    r'https://book.douban.com/top250\?start=\d+'),callback="parse"),

    Rule(LinkExtractor(allow=
    r'https://book.douban.com/subject/\d+'),callback="parse"),
)
def parse(self, response):
    sel = Selector(response=response)
    item = DoubanItem()

    item['name'] = sel.xpath("//h1")[0].extract().strip()

    try:
        contents = sel.xpath("//div[@id='link-report']/p//text()").extract()
        item['content_desc'] = "\n".join(content for content in contents)
    except:
        item['content_desc'] = " "
    try:
        profiles = sel.xpath("//div[@class='related_info']/div[@class='indent']")[1].xpath("//div[@class='intro']/p/text()").extract()
        item['author_profile'] = "\n".join(profile for profile in profiles)
    except:
        item['author_profile'] = " "

    datas = response.xpath("//div[@id='info']//text()").extract()
    datas = [data.strip() for data in datas]
    datas = [data for data in datas if data !='']
    for data in datas:
        if u"作者" in data:
            item["author"] = datas[datas.index(data)+1]
        elif u":" not in data:
            item["author"] = datas[datas.index(data)+2]
        elif u"出版社:" in data:
            item["press"] = datas[datas.index(data)+1]
        elif u"出版年:" in data:
            item["date"] = datas[datas.index(data)+1]
        elif u"页数:" in data:
            item["page"] = datas[datas.index(data)+1]
        elif u"定价:" in data:
            item["price"] = datas[datas.index(data)+1]
        elif u"ISBN:" in data:
            item["ISBN"] = datas[datas.index(data)+1]
    print item
    return item
  • 写回答

3条回答

  • 普通网友 2017-04-26 02:41
    关注

    建议你提供Http抓包的信息或软件自身的log和堆栈

    评论

报告相同问题?

悬赏问题

  • ¥60 版本过低apk如何修改可以兼容新的安卓系统
  • ¥25 由IPR导致的DRIVER_POWER_STATE_FAILURE蓝屏
  • ¥50 有数据,怎么建立模型求影响全要素生产率的因素
  • ¥50 有数据,怎么用matlab求全要素生产率
  • ¥15 TI的insta-spin例程
  • ¥15 完成下列问题完成下列问题
  • ¥15 C#算法问题, 不知道怎么处理这个数据的转换
  • ¥15 YoloV5 第三方库的版本对照问题
  • ¥15 请完成下列相关问题!
  • ¥15 drone 推送镜像时候 purge: true 推送完毕后没有删除对应的镜像,手动拷贝到服务器执行结果正确在样才能让指令自动执行成功删除对应镜像,如何解决?