为什么运用scrapy之后，无法爬取到信息呢？

报错信息如下：图片说明
scrapy中的spider代码如下：

import scrapy
from scrapy import Request,Spider
from ticketCrawler.items import TicketCrawlerItem
from scrapy.selector import Selector
import sys
from lxml import etree
#from calculate import calculatePageNumber
import re

class ticketSpider(scrapy.Spider):
    #爬虫标识，用于区分不同的spider
    name="ticketCrawler"

    start_url = ['https://www.chinaticket.com/']
    urls = {
        'yanchanghui':'https://www.chinaticket.com/wenyi/yanchanghui/',
        'huaju':'https://www.chinaticket.com/wenyi/huaju/',
        'yinleju':'https://www.chinaticket.com/wenyi/yinyueju/',
        'xiqu':'https://www.chinaticket.com/wenyi/xiqu/',
        'baleiwu':'https://www.chinaticket.com/wenyi/baleiwu/',
        'qinzijiating':'https://www.chinaticket.com/wenyi/qinzijiating/',
        'zaji':'https://www.chinaticket.com/wenyi/zaji/',
        'xiangshengxiaopin':'https://www.chinaticket.com/wenyi/xiangshengxiaopin/'
    }
    def start_requests(self):
        try:
            for key,value in self.urls.items():
                yield Request(value.encode('utf-8'),meta={'type':key.encode('utf-8')},callback = self.parse)
        except Exception as err:
                print(err)  
    def get_next_url(self):
        try:
            pass
        except Exception as err:
            print(err)
    def parse(self,response):
        try:
            item = TicketCrawlerItem()
            meta = response.meta()  #概要 meta标签提供关于HTML文档的元数据
            result = response.text.encode("utf-8")
            if result==''or result=='None':
                print("can't get the sourceCode")
                sys.exit()
            tree = etree.HTML(result)
            data = []
            page = tree.xpath("//*[@class='s_num']/text()")[1].replace("\n","").replace('','').encode("utf-8")
            calculateNum = calculatePageNumber()
            pageNUM = calculateNum.calculate_page_number(page)
            count = (pageNUM/10)+1
            listDoms = tree.xpath("//*[@class='s_ticket_list']//ul")
            if(listDoms):
                for itemDom in listDoms:
                    item['type'] = meta['type'].encode('utf-8')
                    try:
                        titleDom = itemDom.xpath("li[@class='ticket_list_tufl']/a/text()")
                        if(titleDom[0]):
                            item['name'] = titleDom[0].encode("utf-8")
                    except Exception as err:
                        print(err)
                    try:
                        urlDom = itemDom.xpath("li[@class='ticket_list_tufl']/a/@href")
                        if(urlDom[0]):
                            item['url'] = urlDom[0].encode("utf-8")
                    except Exception as err:
                        print(err)
                    try:
                        timeDom = itemDom.xpath("li[@class='ticket_list_tufl']/span[1]/text()")
                        if(timeDom[0]):
                            item['time'] = timeDom[0].encode("utf-8").replace('时间：','')
                    except Exception as err:
                        print(err)
                    try:
                        addressDom = itemDom.xpath("li[@class='ticket_list_tufl']/span[2]/text()")
                        if(addressDom[0]):
                            item['address'] = addressDom[0].encode("utf-8").replace('地点：','')
                    except Exception as err:
                        print(err)
                    try:
                        priceDom = itemDom.xpath("li[@class='ticket_list_tufl']/span[3]/text()")
                        if(priceDom[0]):
                            item['time'] = priceDom[0].encode("utf-8").replace('票价：','')
                    except Exception as err:
                        print(err)
                    yield item
                for i in range(2,count+1):
                    next_page = "https://www.chinaticket.com/wenyi/"+str(meta['type'])+"/?o=2&page="+str(i)
                    if next_page is not None:
                        yield scrapy.Request(next_page,meta={"type":meta['type']},callback = self.parse)
        except Exception as err:
            print(err)


class calculatePageNumber():

    def calculate_page_number(self,page):
        try:
            result = re.findall(r"\d+\.?\d*",page)
            return int(result[0])
        except Exception as err:
            print(err)

写回答
好问题 0 提建议
追加酬金
关注问题
分享
邀请回答
编辑收藏删除结题
收藏举报

1条回答默认最新

关注

码龄粉丝数原力等级 --

被采纳

被点赞

采纳率
dabocaiqq 2020-05-12 09:32
关注
https://blog.csdn.net/weixin_41931602/article/details/80189953

解决无用
评论打赏
分享
举报

评论

按下Enter换行，Ctrl+Enter发表内容

报告相同问题？

关注问题

为什么我的scrapy爬不到数据了 python
2020-09-05 13:48

回答 1 已采纳 small_link = 'http:'+li.xpath('./@href').extract_first() 这里错了 response.urljoin(li.xpath('./@href')
scrapy爬取图片，爬取不到 python 有问必答
2021-05-23 20:32

回答 2 已采纳你已经爬到图片连接了，这个看到的管道文件的代码怎样写，要对图片链接发送请求访问，然后保存才行
scrapy下爬虫爬取子页面详细信息部分代码出错 python 爬虫
2021-12-07 21:42

回答 1 已采纳流程通了细节没改 import copy from scrapy import Request from scrapy.spiders import Spider class AniRank(S
关于#pythonscrapy#的问题，如何解决？ python 开发语言爬虫
2023-04-02 16:26

回答 2 已采纳好问题！！抱歉我也不太懂，你问问chatGPT吧：https://new.quke123.com/ 或者其他Python群友：https://app.yinxiang.com
如何利用scrapy爬取带标签的网页内容并保存到自己的服务器上？ mysql python sql
2018-02-09 09:34

回答 3 已采纳 1. 把整个爬取到的网页内容直接存储到数据库肯定是可以的，你之所以没有成功，应该是因为你的数据库中的相应字段错了，整个网页内容都比较长，一般都是要用text字段，甚至是LongText)（最大长度42
scrapy-爬取京东笔记本电脑信息问题 chrome python selenium 开发语言
2020-09-01 19:12

回答 2 已采纳 ``` browser.quit() return HtmlResponse(url=request.url, body=browser.page_source, re
python爬虫框架scrapy实战之爬取京东商城进阶篇
2020-09-21 08:52

主要给大家介绍了利用python爬虫框架scrapy爬取京东商城的相关资料，文中给出了详细的代码介绍供大家参考学习，并在文末给出了完整的代码，需要的朋友们可以参考学习，下面来一起看看吧。
Scrapy框架时爬取网页时报错 python 有问必答
2021-05-26 16:56

回答 2 已采纳你的数据清洗方法用错了，参考一下：https://blog.csdn.net/qq_43004728/article/details/84586628，如有帮助，望采纳
关于#scrapy#的问题，如何解决？ python 爬虫
2023-03-07 18:36

回答 2 已采纳从代码看，你的爬虫似乎只是爬取了起始页面上第一个标题链接的数据。这可能是因为在parse函数中只获取了第一个数据块，而没有对其他数据块进行处理。你可以尝试使用循环迭代数据块，以便对每个数据块进行相同
scrapy 爬虫大量链接返回None不知道为啥 python
2020-05-29 14:50

回答 2 已采纳 200说明成功了，返回None是因为你返回值本来就设置成None，或者没设置返回值导致python默认返回None
Scrapy框架爬取Boss直聘网Python职位信息的
2020-09-19 14:24

今天小编就为大家分享一篇关于Scrapy框架爬取Boss直聘网Python职位信息的源码，小编觉得内容挺不错的，现在分享给大家，具有很好的参考价值，需要的朋友一起跟随小编来看看吧
使用python scrapy框架写爬虫如何爬取搜狐新闻的参与人数？ python 爬虫
2016-03-29 10:07

回答 2 已采纳这个是可能异步ajax返回的，所以需要用selenium等webdriver来处理
Python-基于Python的scrapy爬虫框架实现爬取招聘网站的信息到数据库
2019-08-10 07:39

基于Python的scrapy爬虫框架实现爬取招聘网站的信息到数据库
Python Scrapy多页数据爬取实现过程解析
2020-09-16 17:01

主要介绍了Python Scrapy多页数据爬取实现过程解析,文中通过示例代码介绍的非常详细，对大家的学习或者工作具有一定的参考学习价值,需要的朋友可以参考下
没有解决我的问题, 去提问

悬赏问题

¥15 ubuntu虚拟机打包apk错误
¥199 rust编程架构设计的方案有偿
¥15 回答4f系统的像差计算
¥15 java如何提取出pdf里的文字？
¥100 求三轴之间相互配合画圆以及直线的算法
¥100 c语言，请帮蒟蒻写一个题的范例作参考
¥15 名为“Product”的列已属于此 DataTable
¥15 安卓adb backup备份应用数据失败
¥15 eclipse运行项目时遇到的问题
¥15 关于#c##的问题：最近需要用CAT工具Trados进行一些开发

为什么运用scrapy之后，无法爬取到信息呢？

1条回答 默认 最新

悬赏问题

1条回答默认最新