python爬虫去哪网热门景点

我用python爬虫去哪网热门景点信息，结果只爬到了两页的内容，不知道是哪的问题，有大佬帮忙看看：

-- coding: utf-8 --

created by:tianxing

created date:2017-11-1

import scrapy
import re
import datetime
from practice.items import QvnaItem

class QuNaSpider(scrapy.Spider):
name = 'qvnawang'
#start_urls = ['http://sou.zhaopin.com/jobs/searchresult.ashx?pd=1&jl=%E9%80%89%E6%8B%A9%E5%9C%B0%E5%8C%BA&sm=0&sf=0&st=99999&isadv=1&sg=1545043c61dd44d5bf41f9913890abfa&p=1']
start_urls = ['http://piao.qunar.com/ticket/list.htm?keyword=%E7%83%AD%E9%97%A8%E6%99%AF%E7%82%B9&region=&from=mpl_search_suggest&subject=']
def parse(self,response):
item = QvnaItem()
#得到初始展示页面的基准xpath(某一页)
#pages = response.xpath('//div[@style="width: 224px;*width: 218px; _width:200px; float: left"]/a/@href')
pages = response.xpath('//div[@class="sight_item_pop"]/table/tr[3]/td/a/@href')

    #循环取出每一页上的每一个链接url地址，并调用parse_page函数解析每一个url上的页面内容
    for eachPage in pages:
        #获取链接URL（页面上所有的链接，每个链接单独处理）
        #singleUrl = eachPage.extract()
        singleUrl = 'http://piao.qunar.com'+eachPage.extract()
        #内部调用parse_page函数
        yield scrapy.Request(url = singleUrl,meta={'item':item},callback=self.parse_page)



    #取得除最后一页之外的 '下一页' 的xpath
    try:
        if response.xpath('//div[@class="pager"]/a/@class').extract()[0] == 'next':
            nextPage = 'http://piao.qunar.com' + response.xpath('//div[@class="pager"]/a/@href').extract()[0]
            # 递归调用，将下一页的URL传进Request函数
            yield scrapy.Request(url=nextPage, callback=self.parse)
    except IndexError as ie:
        # 因最后一页没有上述xpath，所以不满足条件，即可退出递归
        try:
            exit()
        except SystemExit as se:
            pass


#爬取单个链接对应的页面内容
def parse_page(self, response):
      # 通过meta得到item
      item = response.meta['item']


      tour_info = response.xpath('/html/body/div[2]/div[2]/div[@class="mp-description-detail"]')

      #景点名称
      try:
          item['name'] = tour_info.xpath('div[1]/span[1]/text()').extract()[0]\
          .replace('\r','').replace('\n','').replace('\t','').replace(' ','').replace('\xa0','').replace('\u3000','')
      except IndexError as ie:
          item['name'] = ''

      #景点等级
      try:
          item['rank'] = tour_info.xpath('div[1]/span[2]/text()').extract()[0]\
           .replace('\r','').replace('\n','').replace('\t','').replace(' ','').replace('\xa0','').replace('\u3000','')
      except IndexError as ie:
          item['rank'] = 0

      #景点描述
      try:
          item['decription'] = tour_info.xpath('div[2]/text()').extract()[0]\
          .replace('/',',').replace('\r','').replace('\n','').replace('\t','').replace(' ','').replace('\xa0','').replace('\u3000','')
      except IndexError as ie:
          item['decription'] = ''

      #景点地点
      try:
          item['address'] = tour_info.xpath('div[3]/span[3]/text()').extract()[0]
          item['address'] = item['address'].replace('/',',').replace(u'、','')\
               .replace(u'（',',').replace('(',',').replace(u'）','').replace(')','')\
               .replace('\r','').replace('\n','').replace('\t','').replace(' ','').replace('\xa0','').replace('\u3000','')
      except IndexError as ie:
          item['address'] = ''

      #用户评价
      try:
          item['comment'] = tour_info.xpath('div[4]/span[3]/span/text()').extract()[0]\
          .replace('/',',').replace('\r','').replace('\n','').replace('\t','').replace(' ','').replace('\xa0','').replace('\u3000','')
      except IndexError as ie:
          item['comment'] = ''

      #天气情况
      try:
          item['weather'] = tour_info.xpath('div[5]/span[3]/text()').extract()[0]\
          .replace('/',',').replace('\r','').replace('\n','').replace('\t','').replace(' ','').replace('\xa0','').replace('\u3000','')
      except IndexError as ie:
          item['weather'] = ''

      #门票最低价格
      try:
          item['lowprice'] = tour_info.xpath('div[7]/span/em/text()').extract()[0]\
          .replace('/',',').replace('\r','').replace('\n','').replace('\t','').replace(' ','').replace('\xa0','').replace('\u3000','')
      except IndexError as ie:
          item['lowprice'] = ''

      #发布日期
      today = datetime.datetime.now()
      item['date'] = today.strftime('%Y-%m-%d')



      yield item

写回答
好问题 0 提建议
追加酬金
关注问题
分享
邀请回答
编辑收藏删除结题
收藏举报

1条回答

关注

码龄粉丝数原力等级 --

被采纳

被点赞

采纳率
threenewbee 2018-06-22 15:51
关注
用fiddler抓包看下，要么是第三页的地址或者参数没有对，要么是服务器有反爬虫的机制（比如频繁访问，返回错误页面、验证码）。

本回答被题主选为最佳回答 , 对您是否有帮助呢?

解决无用
评论打赏
分享
举报

评论

按下Enter换行，Ctrl+Enter发表内容

报告相同问题？

关注问题

python爬虫去哪网热门景点 python 爬虫
2018-06-22 07:53

回答 1 已采纳用fiddler抓包看下，要么是第三页的地址或者参数没有对，要么是服务器有反爬虫的机制（比如频繁访问，返回错误页面、验证码）。
python爬虫问题求解 python 爬虫
2022-04-29 11:12

回答 1 已采纳我给你改了一下，你对比看看吧： from bs4 import BeautifulSoup import pandas as pd import requests def crawer_travel
python爬虫数据显示问题 python 爬虫
2022-07-20 16:54

回答 3 已采纳首先，你这里写错了divs = query(".cm-content-box").items()
python爬取去哪网全国景区数据
2020-02-26 18:34

python爬取去哪网全国景区数据，爬取地址为piao.qunar.com,注意去哪网有反爬虫策略,如果ip被封，可能使用手机热点
python爬虫html获取不全 html python 爬虫
2022-06-24 19:43

回答 1 已采纳其实有的，但是这个网站应该是为了懒加载把url用base64密了一下，然后再动态加载，其实我下面发的这个就是url 是base64后的url 解码后就是https://s1.aigei.com/
Python网络爬虫中json解析失败 json python 有问必答爬虫
2022-02-26 20:51

回答 2 已采纳这个接口返回的是jsonp数据，不是json，要获取text替换掉回调函数名称和前后的括号后才是json数据
python爬虫selenium点击按钮 python selenium 爬虫
2022-10-21 12:35

回答 2 已采纳可以看下xpath语法，还有个插件（xPath Finder）在firefox浏览器里可以一键定位到元素并复制xpath路径，如果插件给出的xpath路径定位不到，可以尝试自己写相对路径
基于python爬虫的中国疫情数据可视化分析
2022-04-24 15:32

包含了所有的源代码，本项目是一个练手的爬虫小案例。
Python爬虫配合VPN爬取出现报错 python 爬虫
2021-12-22 17:33

回答 1 已采纳你这个是VPN代理问题，你可以将VPN设置成部分代理，不要全部代理你的网络。
python 网络爬虫怎么保存下载到本地硬盘 python 正则表达式爬虫
2022-01-04 22:56

回答 1 已采纳 #导入包 import requests import re import os #如果当前项目下有名为美女图片的文件夹，则不创建，么有则创建 if not os.path.exists('美女图片
python 爬虫，如何爬取相关数据 python 有问必答爬虫
2021-11-11 11:15

回答 1 已采纳先确定需要爬取的网站，然后分析网站的数据来源，是后端生成数据还是ajax生成数据，确定数据来源方式就根据HTTP请求编写代码，这个涉及一些请求参数的加密、转换等等处理，然后清洗数据和数据入库
基于Python网络爬虫毕业论文.doc
2020-05-12 16:17

这是一份同学的爬虫的毕业论文，完整的。需要的赶紧拿走
python网络爬虫 python 有问必答
2021-06-23 17:45

回答 2 已采纳建议参考文档：https://www.crummy.com/software/BeautifulSoup/bs4/doc/index.zh.html 都是中文，很好理解如果对你有帮助，可以点击我
python爬虫数据可视化分析大作业.zip
2020-06-12 15:39

python爬虫，并将数据进行可视化分析，数据可视化包含饼图、柱状图、漏斗图、词云、另附源代码和报告书。
python爬虫20个案例
2018-03-25 07:34

讲诉python爬虫的20个案例。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。
没有解决我的问题, 去提问

悬赏问题

¥60 版本过低apk如何修改可以兼容新的安卓系统
¥25 由IPR导致的DRIVER_POWER_STATE_FAILURE蓝屏
¥50 有数据，怎么建立模型求影响全要素生产率的因素
¥50 有数据，怎么用matlab求全要素生产率
¥15 TI的insta-spin例程
¥15 完成下列问题完成下列问题
¥15 C#算法问题, 不知道怎么处理这个数据的转换
¥15 YoloV5 第三方库的版本对照问题
¥15 请完成下列相关问题！
¥15 drone 推送镜像时候 purge: true 推送完毕后没有删除对应的镜像,手动拷贝到服务器执行结果正确在样才能让指令自动执行成功删除对应镜像，如何解决？