采用scrapy框架爬取二手房数据，显示没有爬取到页面和项目，不清楚问题原因

1.item

import scrapy
class LianjiaItem(scrapy.Item):
    # define the fields for your item here like:
    # 房屋名称
    name = scrapy.Field()
    # 房屋户型
    type = scrapy.Field()
    # 建筑面积
    area = scrapy.Field()
    # 房屋朝向
    direction = scrapy.Field()
    # 装修情况
    fitment = scrapy.Field()
    # 有无电梯
    elevator = scrapy.Field()
    # 房屋总价
    total_price = scrapy.Field()
    # 房屋单价
    unit_price = scrapy.Field()
    # 房屋产权
    property = scrapy.Field()

2.settings

    BOT_NAME = 'lianjia'
    SPIDER_MODULES = ['lianjia.spiders']
    NEWSPIDER_MODULE = 'lianjia.spiders'
    USER_AGENT = "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; AcooBrowser; .NET CLR 1.1.4322; .NET CLR 2.0.50727)"
    ROBOTSTXT_OBEY = False
    ITEM_PIPELINES = {
   'lianjia.pipelines.FilterPipeline': 100,
   'lianjia.pipelines.CSVPipeline': 200,
}

3.pipelines

import re
from scrapy.exceptions import DropItem
class FilterPipeline(object):
    def process_item(self,item,spider):
        item['area'] = re.findall(r"\d+\.?\d*",item["area"])[0]
        if item["direction"] == '暂无数据':
            raise DropItem("房屋朝向无数据，抛弃此项目：%s"%item)
        return item
class CSVPipeline(object):
    index = 0
    file = None
    def open_spider(self,spider):
        self.file = open("home.csv","a")
    def process_item(self, item, spider):
        if self.index == 0:
            column_name = "name,type,area,direction,fitment,elevator,total_price,unit_price,property\n"
            self.file.write(column_name)
            self.index = 1
        home_str = item['name']+","+item['type']+","+item['area']+","+item['direction']+","+item['fitment']+","+item['elevator']+","+item['total_price']+","+item['unit_price']+","+item['property']+"\n"
        self.file.write(home_str)
        return item
    def close_spider(self,spider):
        self.file.close()

4.lianjia_spider

import scrapy
from scrapy import Request
from lianjia.items import LianjiaItem

class LianjiaSpiderSpider(scrapy.Spider):
    name = 'lianjia_spider'
    # 获取初始请求
    def start_requests(self):
        # 生成请求对象
        url = 'https://bj.lianjia.com/ershoufang/'
        yield Request(url)
    # 实现主页面解析函数
    def parse(self, response):
        # 使用xpath定位到二手房信息的div元素,保存到列表中
        list_selector = response.xpath("//li/div[@class = 'info clear']")
        # 依次遍历每个选择器,获取二手房的名称,户型,面积,朝向等信息
        for one_selector in list_selector:
            try:
                name = one_selector.xpath("div[@class = 'title']/a/text()").extract_first()
                other = one_selector.xpath("div[@class = 'address']/div[@class = 'houseInfo']/text()").extract_first()
                other_list = other.split("|")
                type = other_list[0].strip(" ")
                area = other_list[1].strip(" ")
                direction = other_list[2].strip(" ")
                fitment = other_list[3].strip(" ")
                total_price = one_selector.xpath("//div[@class = 'totalPrice']/span/text()").extract_first()
                unit_price = one_selector.xpath("//div[@class = 'unitPrice']/@data-price").extract_first()
                url = one_selector.xpath("div[@class = 'title']/a/@href").extract_first()
                yield Request(url,meta={"name":name,"type":type,"area":area,"direction":direction,"fitment":fitment,"total_price":total_price,"unit_price":unit_price},callback=self.otherinformation)
            except:
                pass
        current_page = response.xpath("//div[@class = 'page-box house-lst-page-box']/@page-data").extract_first().split(',')[1].split(':')[1]
        current_page = current_page.replace("}", "")
        current_page = int(current_page)
        if current_page < 100:
            current_page += 1
            next_url = "https://bj.lianjia.com/ershoufang/pg%d/" %(current_page)
            yield Request(next_url,callback=self.otherinformation)
    def otherinformation(self,response):
        elevator = response.xpath("//div[@class = 'base']/div[@class = 'content']/ul/li[12]/text()").extract_first()
        property = response.xpath("//div[@class = 'transaction']/div[@class = 'content']/ul/li[5]/span[2]/text()").extract_first()
        item = LianjiaItem()
        item["name"] = response.meta['name']
        item["type"] = response.meta['type']
        item["area"] = response.meta['area']
        item["direction"] = response.meta['direction']
        item["fitment"] = response.meta['fitment']
        item["total_price"] = response.meta['total_price']
        item["unit_price"] = response.meta['unit_price']
        item["property"] = property
        item["elevator"] = elevator
        yield item

提示错误：

de - interpreting them as being unequal
  if item["direction"] == '鏆傛棤鏁版嵁':

2019-11-25 10:53:35 [scrapy.core.scraper] ERROR: Error processing {'area': u'75.6',
 'direction': u'\u897f\u5357',
 'elevator': u'\u6709',
 'fitment': u'\u7b80\u88c5',
 'name': u'\u6b64\u6237\u578b\u517113\u5957 \u89c6\u91ce\u91c7\u5149\u597d \u65e0\u786c\u4f24 \u4e1a\u4e3b\u8bda\u610f\u51fa\u552e',
 'property': u'\u6ee1\u4e94\u5e74',
 'total_price': None,
 'type': u'2\u5ba41\u5385',
 'unit_price': None}
Traceback (most recent call last):
  File "f:\python_3.6\venv\lib\site-packages\twisted\internet\defer.py", line 654, in _runCallbacks
    current.result = callback(current.result, *args, **kw)
  File "F:\python_3.6\lianjia\lianjia\pipelines.py", line 25, in process_item
    home_str = item['name']+","+item['type']+","+item['area']+","+item['direction']+","+item['fitment']+","+item['elevator']+","+item['total_price']+","+item['unit_price']+
","+item['property']+"\n"
TypeError: coercing to Unicode: need string or buffer, NoneType found

写回答
好问题 0 提建议
关注问题
分享
邀请回答
编辑收藏删除结题
收藏举报

2条回答默认最新

关注

码龄粉丝数原力等级 --

被采纳

被点赞

采纳率
蔡能教授，网站特聘专家 2019-11-25 12:48
关注
https://blog.csdn.net/weixin_41931602/article/details/80200695

解决无用
评论打赏
分享
举报

评论

按下Enter换行，Ctrl+Enter发表内容

报告相同问题？

关注问题

基于Python Scrapy爬虫框架实现的链家二手房数据爬取系统的设计与实现毕业设计论文答辩用 1万+字共41页.docx
2022-02-13 17:30

### 基于Python Scrapy爬虫框架实现的链家二手房数据爬取系统的设计与实现 #### 一、设计背景及概括自21世纪以来，互联网技术的飞速发展使得人们的生活方式发生了翻天覆地的变化。在房地产领域，随着城镇化进程的...
利用Scrapy框架爬取房天下上全国新房、二手房数据.zip
2025-08-29 15:45

在这个项目中，开发者选择了Scrapy框架来实现数据的爬取，这是一个在Python编程语言下开发的强大的网络爬虫框架。Scrapy框架以其高效的性能、灵活的扩展和稳定的运行环境，成为了网络爬取领域中一个受欢迎的选择。房...
基于Python Scrapy框架的链家二手房爬虫设计源码
2024-10-01 12:15

基于Python Scrapy框架的链家二手房爬虫设计源码为用户提供了一套高效的数据抓取解决方案，它不仅可以应用于房地产行业，也可以扩展到其他需要大规模数据采集的领域。通过这个项目，我们不仅能够学习到如何使用...
Python爬虫-scrapy-城市二手房数据爬取与保存
2023-01-09 12:35

本项目专注于利用Scrapy框架来实现这一目标，Scrapy是一个强大的Python爬虫框架，它提供了丰富的功能，使得网络数据抓取变得更加便捷和模块化。首先，我们要了解Scrapy的基础架构。Scrapy由多个组件组成，如...
python Scrapy框架爬取58二手房信息（有遗留问题，有懂行大佬欢迎给建议）
2023-12-26 15:39

一个爱学习的菜鸡的博客爬取58信息，并进行入库操作。
基于Python Scrapy爬虫框架实现的二手房数据爬取系统的设计与实现毕业设计论文答辩用 1万+字共40页基于Scrapy框架的二手房数据爬取系统设计：分布式爬虫架构与可视化分析应用
2025-09-29 21:59

内容概要：本文设计并实现了一个基于Python Scrapy框架的二手房数据爬取系统，旨在解决当前二手房交易平台中存在的房源质量参差、用户匹配度低等问题，为后续的房源推荐系统提供高质量的数据支持。系统采用分布式...
python爬取二手房信息_使用Scrapy爬取链家二手房信息
2020-12-10 05:50

weixin_39669701的博客 Mysql数据库项目说明：本项目基于Python Scrapy爬虫框架对lianjia房产交易网站二手房小区、小区在售房屋数据进行爬取。数据爬取为三级页面递归爬取，整个爬取流程如下：1. 搜索小区名，在结果页面中找到小区名，...
源码：利用python的scrapy框架爬取安居客房价信息存入数据库并可视化
2022-01-23 21:50

在本项目中，我们主要探讨如何使用Python的Scrapy框架来爬取安居客网站上的房价信息，并将这些数据存储到数据库中，最后实现数据的可视化。以下是对整个过程的详细阐述： 1. **Scrapy框架**： Scrapy是一个用...
2024年Scrapy爬取二手房信息+可视化数据分析_scrapy 安居客 css
2024-05-01 15:21

2401_84563287的博客 Scrapy中的元数据field其实是继承了Python中的字典数据类型，使用起来很方便，博主直接定义了几个住房的信息，如下代码所示。当然还有高级的用法，配合itemloader加入processor，这里只使用简单的定义即可。pass爬虫...
没有解决我的问题, 去提问

采用scrapy框架爬取二手房数据，显示没有爬取到页面和项目，不清楚问题原因

2条回答 默认 最新

2条回答默认最新