东北大米Fun 2021-08-04 06:20 采纳率: 0%
浏览 36

Scrapy框架在meta传值时少了一组结果,什么导致的?

问题:

img

img

爬取网站:http://www.52jingsai.com/bisai/keji/index.php?jsstatus=2&jssort=0
目的:按照标签爬取每个竞赛的信息
spider代码:

class CsSpider(scrapy.Spider):
    name = 'cs'
    allowed_domains = ['52jingsai.com']
    start_urls = ['http://www.52jingsai.com/bisai/keji/index.php?jsstatus=2&jssort=0']
    # 获取活动对象标签
    def parse(self, response):
        li_lst = [i.xpath('.//a/@href').get() for i in response.xpath('//div[@class="js"]/div[2]/ul/li')[2::]]
        li_text = [i.xpath('.//a/text()').get().strip() for i in response.xpath('//div[@class="js"]/div[2]/ul/li')[2::]]
        for num in range(len(li_lst)):
            item = CompetitionsItem(competition_level=li_text[num])
            yield scrapy.Request(
                url=li_lst[num],
                callback=self.order_parse,
                meta={'item': deepcopy(item)}
            )
    # 获取竞赛排序标签
    def order_parse(self, response):
        item = response.meta.get('item')
        li_lst = [i.xpath('./@href').get() for i in response.xpath('//div[@class="js"]/div[3]/ul/li/a')]
        li_text = [i.xpath('./text()').get() for i in response.xpath('//div[@class="js"]/div[3]/ul/li/a')]
        for num in range(len(li_lst)):
            item['competitions_label'] = li_text[num]
            yield scrapy.Request(
                url=li_lst[num],
                callback=self.details,
                meta={'item': deepcopy(item)}
            )
    # 获取详细信息
    def details(self, response):
        print(response.meta['item'])

这是怎么回事?第一次见,求解!

  • 写回答

1条回答 默认 最新

  • CSDN专家-HGJ 2021-08-04 14:32
    关注

    不知你代码中CompetitionsItem这个类是如何定义的,检查一下CompetitionsItem,代码写成如下可以获取竞赛标签信息。

    import scrapy
    from copy import deepcopy
    class CompetitionsItem(scrapy.Item):
        competition_level=scrapy.Field()
        competitions_label=scrapy.Field()
    
    class CsSpider(scrapy.Spider):
        name = 'cs'
        allowed_domains = ['52jingsai.com']
        start_urls = [
            'http://www.52jingsai.com/bisai/keji/index.php?jsstatus=2&jssort=0']
        # 获取活动对象标签
    
        def parse(self, response):
            li_lst = [i.xpath('.//a/@href').get()
                      for i in response.xpath('//div[@class="js"]/div[2]/ul/li')[2::]]
            li_text = [i.xpath('.//a/text()').get().strip()
                       for i in response.xpath('//div[@class="js"]/div[2]/ul/li')[2::]]
            for num in range(len(li_lst)):
                item = CompetitionsItem(competition_level=li_text[num])
                yield scrapy.Request(
                    url=li_lst[num],
                    callback=self.order_parse,
                    meta={'item': deepcopy(item)}
                )
        # 获取竞赛排序标签
    
        def order_parse(self, response):
            item = response.meta.get('item')
            li_lst = [i.xpath('./@href').get()
                      for i in response.xpath('//div[@class="js"]/div[3]/ul/li/a')]
            li_text = [i.xpath('./text()').get()
                       for i in response.xpath('//div[@class="js"]/div[3]/ul/li/a')]
            for num in range(len(li_lst)):
                item['competitions_label'] = li_text[num]
                yield scrapy.Request(
                    url=li_lst[num],
                    callback=self.details,
                    meta={'item': deepcopy(item)}
                )
        # 获取详细信息
    
        def details(self, response):
            print(response.meta['item'])
    

    #输出:
    {'competition_level': '全国', 'competitions_label': '热门'}
    {'competition_level': '全国', 'competitions_label': '推荐'}
    {'competition_level': '国际', 'competitions_label': '热门'}
    {'competition_level': '全国', 'competitions_label': '最新'}
    {'competition_level': '国际', 'competitions_label': '推荐'}
    {'competition_level': '国际', 'competitions_label': '最新'}
    {'competition_level': '各省', 'competitions_label': '推荐'}
    {'competition_level': '各省', 'competitions_label': '热门'}

    评论

报告相同问题?

问题事件

  • 创建了问题 8月4日

悬赏问题

  • ¥15 asp写PC网站开通了微信支付,扫码付款不能跳转
  • ¥50 AI大模型精调(百度千帆、飞浆)
  • ¥15 关于#c语言#的问题:我在vscode和codeblocks中编写c语言时出现打不开源文件该怎么办
  • ¥15 非科班怎么跑代码?如何导数据和调参
  • ¥15 福州市的全人群死因监测点死亡原因报表
  • ¥15 Altair EDEM中生成一个颗粒,并且各个方向没有初始速度
  • ¥15 系统2008r2 装机配置推荐一下
  • ¥500 服务器搭建cisco AnyConnect vpn
  • ¥15 悬赏Python-playwright部署在centos7上
  • ¥15 psoc creator软件有没有人能远程安装啊