方兔叽 2020-03-04 16:49 采纳率: 0%

已采纳

scrapy存到mysql查询无数据

1. 问题描述

尝试使用scrapy框架爬取网站，将爬取的数据存储到mysql数据库，执行完毕之后没有报错，但是我查询数据时，显示没有数据
（代码框架参考使用该博主代码尝试运行：
https://www.cnblogs.com/fromlantianwei/p/10607956.html）

2. 部分截图

scrapy项目：

图片说明

数据库创建：

图片说明
##3. 相关代码
scrapy框架代码：

（1）tencent爬虫文件

# -*- coding: utf-8 -*-
import scrapy
from urllib import parse
import re
from copy import deepcopy

from ScrapyPro3.items import ScrapyPro3Item


class tencentSpider(scrapy.Spider):
    name = 'tencent'

    allowed_domains = []
    start_urls = [
        'http://tieba.baidu.com/mo/q----,sz@320_240-1-3---2/m?kw=%E6%A1%82%E6%9E%97%E7%94%B5%E5%AD%90%E7%A7%91%E6%8A%80%E5%A4%A7%E5%AD%A6%E5%8C%97%E6%B5%B7%E6%A0%A1%E5%8C%BA&pn=26140',
        ]

    def parse(self, response):  # 总页面
        item = ScrapyPro3Item()

        all_elements = response.xpath(".//div[@class='i']")
        # print(all_elements)

        for all_element in all_elements:
            content = all_element.xpath("./a/text()").extract_first()
            content = "".join(content.split())
            change = re.compile(r'[\d]+.')
            content = change.sub('', content)
            item['comment'] = content

            person = all_element.xpath("./p/text()").extract_first()
            person = "".join(person.split())
            # 去掉点赞数 评论数
            change2 = re.compile(r'点[\d]+回[\d]+')
            person = change2.sub('', person)
            # 选择日期
            change3 = re.compile(r'[\d]?[\d]?-[\d][\d](?=)')
            date = change3.findall(person)

            # 如果为今天则选择时间
            change4 = re.compile(r'[\d]?[\d]?:[\d][\d](?=)')
            time = change4.findall(person)

            person = change3.sub('', person)
            person = change4.sub('', person)

            if time == []:
                item['time'] = date
            else:
                item['time'] = time

            item['name'] = person

            # 增加密码 活跃
            item['is_active'] = '1'
            item['password'] = '123456'

            print(item)
            yield item

        # 下一页
        """next_url = 'http://tieba.baidu.com/mo/q----,sz@320_240-1-3---2/' + parse.unquote(
            response.xpath(".//div[@class='bc p']/a/@href").extract_first())

        print(next_url)
        yield scrapy.Request(
            next_url,
            callback=self.parse,

        )"""

（2）item文件

# -*- coding: utf-8 -*-

# Define here the models for your scraped items
#
# See documentation in:
# https://doc.scrapy.org/en/latest/topics/items.html

import scrapy


class ScrapyPro3Item(scrapy.Item):
    # define the fields for your item here like:
    # name = scrapy.Field()
    comment = scrapy.Field()
    time = scrapy.Field()
    name = scrapy.Field()
    password = scrapy.Field()
    is_active = scrapy.Field()

（3）pipelines文件

-- coding: utf-8 --

Define your item pipelines here

Don't forget to add your pipeline to the ITEM_PIPELINES setting

See: https://docs.scrapy.org/en/latest/topics/item-pipeline.html

"""class Scrapypro3Pipeline(object):
def process_item(self, item, spider):
return item"""
import pymysql
from twisted.enterprise import adbapi

class Scrapypro3Pipeline(object):
def init(self, dbpool):
self.dbpool = dbpool

@classmethod
def from_settings(cls, settings):  # 函数名固定，会被scrapy调用，直接可用settings的值
    """
    数据库建立连接
    :param settings: 配置参数
    :return: 实例化参数
    """
    adbparams = dict(
        host='localhost',
        db='mu_ke',
        user='root',
        password='root',
        cursorclass=pymysql.cursors.DictCursor  # 指定cursor类型
    )
    # 连接数据池ConnectionPool，使用pymysql或者Mysqldb连接
    dbpool = adbapi.ConnectionPool('pymysql', **adbparams)
    # 返回实例化参数
    return cls(dbpool)

def process_item(self, item, spider):
    """
    使用twisted将MySQL插入变成异步执行。通过连接池执行具体的sql操作，返回一个对象
    """
    query = self.dbpool.runInteraction(self.do_insert, item)  # 指定操作方法和操作数据
    # 添加异常处理
    query.addCallback(self.handle_error)  # 处理异常

def do_insert(self, cursor, item):
    # 对数据库进行插入操作，并不需要commit，twisted会自动commit
    insert_sql = """
    insert into login_person(name,password,is_active,comment,time) VALUES(%s,%s,%s,%s,%s)
                """
    cursor.execute(insert_sql, (item['name'], item['password'], item['is_active'], item['comment'],
                                item['time']))

def handle_error(self, failure):
    if failure:
        # 打印错误信息
        print(failure)```


（4） settings文件

-- coding: utf-8 --

Scrapy settings for ScrapyPro3 project

For simplicity, this file contains only settings considered important or

commonly used. You can find more settings consulting the documentation:

https://doc.scrapy.org/en/latest/topics/settings.html

https://doc.scrapy.org/en/latest/topics/downloader-middleware.html

https://doc.scrapy.org/en/latest/topics/spider-middleware.html

BOT_NAME = 'ScrapyPro3'

SPIDER_MODULES = ['ScrapyPro3.spiders']
NEWSPIDER_MODULE = 'ScrapyPro3.spiders'

Crawl responsibly by identifying yourself (and your website) on the user-agent

USER_AGENT = 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.110 Safari/537.36'

MYSQL_HOST = 'localhost'
MYSQL_DBNAME = 'mu_ke'
MYSQL_USER = 'root'
MYSQL_PASSWD = 'root'

Obey robots.txt rules

ROBOTSTXT_OBEY = False

Configure maximum concurrent requests performed by Scrapy (default: 16)

#CONCURRENT_REQUESTS = 32

Configure a delay for requests for the same website (default: 0)

See https://doc.scrapy.org/en/latest/topics/settings.html#download-delay

The download delay setting will honor only one of:

#CONCURRENT_REQUESTS_PER_DOMAIN = 16
#CONCURRENT_REQUESTS_PER_IP = 16

Disable cookies (enabled by default)

#COOKIES_ENABLED = False

Disable Telnet Console (enabled by default)

#TELNETCONSOLE_ENABLED = False

Override the default request headers:

#DEFAULT_REQUEST_HEADERS = {

'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,/;q=0.8',

'Accept-Language': 'en',

Enable or disable spider middlewares

See https://doc.scrapy.org/en/latest/topics/spider-middleware.html

#SPIDER_MIDDLEWARES = {

'ScrapyPro3.middlewares.ScrapyPro3SpiderMiddleware': 543,

Enable or disable downloader middlewares

See https://doc.scrapy.org/en/latest/topics/downloader-middleware.html

#DOWNLOADER_MIDDLEWARES = {

'ScrapyPro3.middlewares.ScrapyPro3DownloaderMiddleware': 543,

Enable or disable extensions

See https://doc.scrapy.org/en/latest/topics/extensions.html

#EXTENSIONS = {

'scrapy.extensions.telnet.TelnetConsole': None,

Configure item pipelines

See https://doc.scrapy.org/en/latest/topics/item-pipeline.html

ITEM_PIPELINES = {
'ScrapyPro3.pipelines.Scrapypro3Pipeline':200,

}

Enable and configure the AutoThrottle extension (disabled by default)

See https://doc.scrapy.org/en/latest/topics/autothrottle.html

#AUTOTHROTTLE_ENABLED = True

The initial download delay

#AUTOTHROTTLE_START_DELAY = 5

The maximum download delay to be set in case of high latencies

#AUTOTHROTTLE_MAX_DELAY = 60

The average number of requests Scrapy should be sending in parallel to

each remote server

#AUTOTHROTTLE_TARGET_CONCURRENCY = 1.0

Enable showing throttling stats for every response received:

#AUTOTHROTTLE_DEBUG = False

Enable and configure HTTP caching (disabled by default)

See https://doc.scrapy.org/en/latest/topics/downloader-middleware.html#httpcache-middleware-settings

#HTTPCACHE_ENABLED = True
#HTTPCACHE_EXPIRATION_SECS = 0
#HTTPCACHE_DIR = 'httpcache'
#HTTPCACHE_IGNORE_HTTP_CODES = []
#HTTPCACHE_STORAGE = 'scrapy.extensions.httpcache.FilesystemCacheStorage'


（5）start文件——执行爬虫文件

from scrapy import cmdline
cmdline.execute(["scrapy","crawl","tencent"])



数据库创建代码：

create database mu_ke;
CREATE TABLE login_person (
id int(10) NOT NULL AUTO_INCREMENT,
name varchar(100) DEFAULT NULL,
passsword varchar(100) DEFAULT NULL,
is_active varchar(100) DEFAULT NULL,
comment varchar(100) DEFAULT NULL,
time varchar(100) DEFAULT NULL,
PRIMARY KEY (id)
) ENGINE=InnoDB AUTO_INCREMENT=1181 DEFAULT CHARSET=utf8;
select count(name) from login_person;#查询结果条数为0

# 运行完代码后查询数据，显示条数为0，这里面有什么问题吗？

（1） 
执行过程正常

（2）运行

pycharm2019.3

python3.8 

mysql8.0（workbench8.0）

（3) 数据连接没有

写回答
好问题 0 提建议
追加酬金
关注问题
分享
邀请回答
编辑收藏删除结题
收藏举报

1条回答默认最新

关注

码龄粉丝数原力等级 --

被采纳

被点赞

采纳率
放风喽 2020-03-05 10:44
关注
在pipelines文件的内部，打印item，看看数据到底有没有获取到
连接数据库成功后，打印一个数据库内部的数据，看看是不是连接成功
大概率你没搞到数据，所以什么也没有写入

本回答被题主选为最佳回答 , 对您是否有帮助呢?

解决无用
评论打赏
分享
举报

评论

按下Enter换行，Ctrl+Enter发表内容

报告相同问题？

关注问题

为什么我的scrapy爬不到数据了 python
2020-09-05 13:48

回答 1 已采纳 small_link = 'http:'+li.xpath('./@href').extract_first() 这里错了 response.urljoin(li.xpath('./@href')
scrapy怎么没打印数据？ python
2021-07-21 12:31

回答 3 已采纳 allowed_domains = ["jobui.com"] 好像是不要WWW 或者这个直接不写 allowed_domains = ["jobui.com"]
scrapy定义类，然后封装数据 python
2022-05-04 22:49

回答 1 已采纳可以新建一个.py文件写，也可以在python控制台写，终端不行哦
scrapy 保存到mysql_scrapy爬虫保存数据到mysql
2021-02-04 20:04

就一半仙的博客直接上例子# -*- coding: utf.../usr/bin/python3# Define your item pipelines here## Don't forget to add your pipeline to the ITEM_PIPELINES setting# See: http://doc.scrapy.org/en/latest/topics/item-pipe...
利用scrapy如何爬取图表中的数据 python 其他有问必答
2021-07-07 23:48

回答 1 已采纳 scrapy得出的响应内容是在network的doc里面，如图如果对你有帮助，可以点击我这个回答右上方的【采纳】按钮，给我个采纳吗，谢谢
scrapy中把数据存储到MongoDB，运行也没出错怎么查找不到数据库呢？？？ mongodb python
2020-06-05 16:06

回答 1 已采纳 1.密码设置了吗 2.127.0.0.1改成localhost试试 3.我没招了
scrapy爬虫相关关于json数据的处理 json 爬虫
2018-03-14 09:42

回答 1 已采纳使用 JSON 函数需要导入 json 库：import json
scrapy爬取数据存到MySQL_scrapy爬取数据保存到mysql数据库
2021-01-28 20:53

weixin_39788451的博客其实很简单，数据源都拿到了，入库就是信手拈来，着重介绍利用pymysql连接mysql数据库以及封装。1.首先在settings里定义数据库连接属性：地址：host、端口号：port、数据库名：dbname、用户名：user、密码：password...
scrapy爬取图片，爬取不到 python 有问必答
2021-05-23 20:32

回答 2 已采纳你已经爬到图片连接了，这个看到的管道文件的代码怎样写，要对图片链接发送请求访问，然后保存才行
scrapy 中xpath路径获取不到内容 chrome python 前端有问必答
2022-09-05 12:07

回答 2 已采纳 a标签不仅仅含有文本，那么没有其它节点可以定位了吗
如何利用scrapy爬取带标签的网页内容并保存到自己的服务器上？ mysql python sql
2018-02-09 09:34

回答 3 已采纳 1. 把整个爬取到的网页内容直接存储到数据库肯定是可以的，你之所以没有成功，应该是因为你的数据库中的相应字段错了，整个网页内容都比较长，一般都是要用text字段，甚至是LongText)（最大长度42
scrapy 保存到mysql_Scrapy保存数据到mysql
2021-01-19 14:23

方外俗汉释悟修的博客 scrapy pipeline 文件如下：class JianshuPipeline(object):def __init__(self):data = {'host':'localhost','port':3306,'user':'root','password':'******','database':'jianshu','charset':'utf8mb4'}self.conn =...
Scrapy爬取数据并存储到MySQL
2020-09-27 20:29

m0_37914799的博客 Scrapy爬虫爬虫框架架构流程组件功能同步插入数据库异步插入数据库 mysql pipeline
scrapy存到mysql测试用例
2020-03-05 22:21

方兔叽的博客刚开始学习爬虫，以下是测试用例 **环境： (1. pycharm 2019.3 (2. python 3.8 （遇到pip升级到3.9的问题——版本升级） [ #打开命令提示符cmd（Windows键+R / 直接搜索cmd ）我的对应解决命令是： python -m pip ...
Python《scrapy爬虫框架模板，将数据保存到Mysql数据库或者文件中》+源代码+补充说明
2024-07-20 21:07

使用scrapy爬虫框架将数据保存Mysql数据库和文件中 - 不懂运行，下载完可以私聊问，可远程教学该资源内项目源码是个人的毕设，代码都测试ok，都是运行成功后才上传资源，答辩评审平均分达到96分，放心下载使用！ 1...
没有解决我的问题, 去提问

悬赏问题

¥30 Matlab打开默认名称带有/的光谱数据
¥50 easyExcel模板动态单元格合并列
¥15 res.rows如何取值使用
¥15 在odoo17开发环境中，怎么实现库存管理系统，或独立模块设计与AGV小车对接？开发方面应如何设计和开发？请详细解释MES或WMS在与AGV小车对接时需完成的设计和开发
¥15 CSP算法实现EEG特征提取，哪一步错了？
¥15 游戏盾如何溯源服务器真实ip?需要30个字。后面的字是凑数的
¥15 vue3前端取消收藏的不会引用collectId
¥15 delphi7 HMAC_SHA256方式加密
¥15 关于#qt#的问题：我想实现qcustomplot完成坐标轴
¥15 下列c语言代码为何输出了多余的空格

scrapy存到mysql查询无数据

1. 问题描述

2. 部分截图

-*- coding: utf-8 -*-

Define your item pipelines here

Don't forget to add your pipeline to the ITEM_PIPELINES setting

See: https://docs.scrapy.org/en/latest/topics/item-pipeline.html

-*- coding: utf-8 -*-

Scrapy settings for ScrapyPro3 project

For simplicity, this file contains only settings considered important or

commonly used. You can find more settings consulting the documentation:

https://doc.scrapy.org/en/latest/topics/settings.html

https://doc.scrapy.org/en/latest/topics/downloader-middleware.html

https://doc.scrapy.org/en/latest/topics/spider-middleware.html

Crawl responsibly by identifying yourself (and your website) on the user-agent

Obey robots.txt rules

Configure maximum concurrent requests performed by Scrapy (default: 16)

Configure a delay for requests for the same website (default: 0)

See https://doc.scrapy.org/en/latest/topics/settings.html#download-delay

See also autothrottle settings and docs

The download delay setting will honor only one of:

Disable cookies (enabled by default)

Disable Telnet Console (enabled by default)

Override the default request headers:

'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',

'Accept-Language': 'en',

Enable or disable spider middlewares

See https://doc.scrapy.org/en/latest/topics/spider-middleware.html

'ScrapyPro3.middlewares.ScrapyPro3SpiderMiddleware': 543,

Enable or disable downloader middlewares

See https://doc.scrapy.org/en/latest/topics/downloader-middleware.html

'ScrapyPro3.middlewares.ScrapyPro3DownloaderMiddleware': 543,

Enable or disable extensions

See https://doc.scrapy.org/en/latest/topics/extensions.html

'scrapy.extensions.telnet.TelnetConsole': None,

Configure item pipelines

See https://doc.scrapy.org/en/latest/topics/item-pipeline.html

Enable and configure the AutoThrottle extension (disabled by default)

See https://doc.scrapy.org/en/latest/topics/autothrottle.html

The initial download delay

The maximum download delay to be set in case of high latencies

The average number of requests Scrapy should be sending in parallel to

each remote server

Enable showing throttling stats for every response received:

Enable and configure HTTP caching (disabled by default)

See https://doc.scrapy.org/en/latest/topics/downloader-middleware.html#httpcache-middleware-settings

1条回答 默认 最新

悬赏问题

-- coding: utf-8 --

-- coding: utf-8 --

'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,/;q=0.8',

1条回答默认最新