温橙与粥 2022-08-09 13:39 采纳率: 100%
浏览 171
已结题

为什么我的python爬取内容只显示最后一个数据,如果可以,希望附改正解决代码!

为什么我的运行只显示数据的最后一个,还有就是我的程序一运行就出现很多个文件,该怎么解决啊?
import requests
import xlwings as xw
import json
from lxml import etree

if name == 'main':
url = 'https://etaps.org/2017/tacas'

headers = {
    "user-agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/97.0.4692.71 Safari/537.36"
}
meet_text = requests.get(url=url, headers=headers).text
tree = etree.HTML(meet_text)
div_list = tree.xpath("//div[@class='module-inner2']")

for div in div_list:

    huiyi_list = div.xpath('./p/a')
    for huiyi in huiyi_list:
        huiyi_name = '23rd International Conference on Tools and Algorithms for the Construction and Analysis of Systems '
        theme = 'POST 2017'
        time = '22-29 April 2017'
        huiyi_S = ''
        huiyi_Z = huiyi.xpath("//div[@class='module-inner2']/p[1]/a/text()")
        huiyi_P = huiyi.xpath("//div[@class='module-inner2']/p[position()>2]/a/text()")
        # for i in huiyi_P:
        #     print(i)
        # print(i)
        huiyi_school = huiyi.xpath("//div[@class='module-inner2']/p/text()")

        for teama, teamb in zip(huiyi_P, huiyi_school):

            # for o in huiyi_school:
            #     pass
            # print(o)
            # url_list = url
            # print(i,o)
            # print(huiyi_Z)
            # print(huiyi_school)
            # header = ['会议名称','会议主题','会议时间','主题演讲人','会议主席','参会人员','人员列表','机构']
            # data = [huiyi_name,theme,time,huiyi_S,huiyi_Z,huiyi_P,huiyi_school,url_list]
            wb = xw.Book()
            sht = wb.sheets('sheet1')
            sht.range("A1").value = "会议名称"
            sht.range("B1").value = "会议主题"
            sht.range("C1").value = "会议时间"
            sht.range("D1").value = "主题演讲人"
            sht.range("E1").value = "会议主席"
            sht.range("F1").value = "参会人"
            sht.range("G1").value = "机构"
            sht.range("H1").value = "list_url"

            for i in range(8):
                huiyi_name = json.dumps('6th International Conference on Principles of Security and Trust')
                sht.range(f'A{i + 2}').value = huiyi_name
                theme = json.dumps('POST 2017')
                sht.range(f'B{i + 2}').value = theme
                time = json.dumps("22-29 April 2017")
                sht.range(f'C{i + 2}').value = time
                huiyi_S = ''
                sht.range(f'D{i + 2}').value = huiyi_S
                huiyi_Z = ''
                sht.range(f'E{i + 2}').value = huiyi_Z
                huiyi_P = json.dumps(teama)
                sht.range(f'F{i + 2}').value = teama
                huiyi_school = json.dumps(teamb)
                sht.range(f'G{i + 2}').value = teamb
                url = json.dumps('https://etaps.org/2017/post')
                sht.range(f'H{i + 2}').value = url

                print("会议名称:" + huiyi_name, "会议主题:" + theme, "会议时间:" + time, "主题演讲人:" + huiyi_S, "会议主席:" + huiyi_Z,
                      "参会人:" + huiyi_P, "机构:" + huiyi_S, "list_url:" + url)
  • 写回答

3条回答 默认 最新

  • 脚踏南山 2022-08-09 15:42
    关注

    img

    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论 编辑记录
查看更多回答(2条)

报告相同问题?

问题事件

  • 系统已结题 8月17日
  • 已采纳回答 8月9日
  • 修改了问题 8月9日
  • 修改了问题 8月9日
  • 展开全部

悬赏问题

  • ¥15 一道python难题2
  • ¥15 一道python难题
  • ¥15 用matlab 设计一个不动点迭代法求解非线性方程组的代码
  • ¥15 牛顿斯科特系数表表示
  • ¥15 arduino 步进电机
  • ¥20 程序进入HardFault_Handler
  • ¥15 oracle集群安装出bug
  • ¥15 关于#python#的问题:自动化测试
  • ¥20 问题请教!vue项目关于Nginx配置nonce安全策略的问题
  • ¥15 教务系统账号被盗号如何追溯设备