hyggest 2020-01-11 23:17 采纳率: 0%
浏览 558
已采纳

请问为什么当我试图储存爬到的多个页面的数据时,只能保存最后一页的数据

import pandas as pd
import re
import requests
from requests import RequestException
from bs4 import BeautifulSoup

def getHTMLText(url):
try:
r = requests.get(url, timeout=30)
r.raise_for_status()
r.encoding = r.apparent_encoding
return r.text
except:
return ""
for i in range(2,5):
url = "https://bj.lianjia.com/xiaoqu/pg" + str(i) +"/?from=rec"
print(url)
html = getHTMLText(url)
pattern = re.compile('

.*?(.*?).*?
(.*?)', re.S)
items = re.findall(pattern, html)
print(items)
name = []
price = []
info = []
for item in items:
print(item)
name.append(item[0])
name
price.append(item[1])
info = list(zip(name,price))
headers = ['小区', '价格']
filen_name = 'C:\Users\86157\Desktop\1.csv'
data3 = pd.DataFrame(columns = headers,data = info)
data3.to_csv(file_name, encoding='utf_8_sig')
pd.read_csv(file_name)
这是我写的代码

  • 写回答

1条回答 默认 最新

  • 7*24 工作者 2020-01-12 18:54
    关注

    我把你写的爬虫代码给简单改了下,用正则匹配你这个写的有问题,后期需要加强下正则学习,我用lxml解析的源代码,因为你写的是同步执行的,所以我也是同步执行的,没有改成异步的,这样爬虫用异步方式是最好的

    #-*- coding:utf-8 -*-
    
    import pandas as pd
    import requests
    from lxml import etree
    
    def getHTMLText(url):
        Headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/79.0.3945.88 Safari/537.36'}
        try:
            r = requests.get(url, timeout=30,headers=Headers)
            r.raise_for_status()
            r.encoding = r.apparent_encoding
            return r.content.decode('utf-8')
        except:
            return ""
    
    if __name__ == '__main__':
        names = []
        prices = []
        info = []
        for i in range(2,5):
            url = "https://bj.lianjia.com/xiaoqu/pg" + str(i) +"/?from=rec"
            print(url)
            html = getHTMLText(url)
            if html:
                datas = etree.HTML(html)
                name = datas.xpath("//div[@class='info']/div[@class='title']/a/text()")
                price = datas.xpath("//div[@class='totalPrice']/span/text()")
                names.extend(name)
                prices.extend(price)
    
        info = list(zip(names,prices))
        headers = ['小区', '价格']
        filen_name = '1.csv'
        data3 = pd.DataFrame(columns = headers,data = info)
        data3.to_csv(filen_name, encoding='utf-8')
    
    
    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论

报告相同问题?

悬赏问题

  • ¥15 uniapp uview http 如何实现统一的请求异常信息提示?
  • ¥15 有了解d3和topogram.js库的吗?有偿请教
  • ¥100 任意维数的K均值聚类
  • ¥15 stamps做sbas-insar,时序沉降图怎么画
  • ¥15 买了个传感器,根据商家发的代码和步骤使用但是代码报错了不会改,有没有人可以看看
  • ¥15 关于#Java#的问题,如何解决?
  • ¥15 加热介质是液体,换热器壳侧导热系数和总的导热系数怎么算
  • ¥100 嵌入式系统基于PIC16F882和热敏电阻的数字温度计
  • ¥15 cmd cl 0x000007b
  • ¥20 BAPI_PR_CHANGE how to add account assignment information for service line