H_H_fighting 2021-04-25 20:59 采纳率: 100%
浏览 346
已采纳

爬虫出现IndexError: list index out of range

应该是有的用户没有生日那一行,才出现了错误。但我不知道怎么改,请帮忙看看,谢谢。


# 代码来自龙王山小青椒https://www.bilibili.com/video/BV1M64y1u7wE
import requests
from lxml import etree
from collections import OrderedDict
from urllib.parse import quote
import csv
import traceback
import random
import re
from time import sleep
import os
from datetime import datetime, timedelta
import sys
import numpy as np
import pandas as pd
import time

header = {'Content-Type':'xx','User-Agent':'xx'}
Cookie = {'Cookie':'xxx'}

# 导入用户id
weibo_comment_df = pd.read_csv('weibo_comment.csv')
weibo_comments = weibo_comment_df.values.tolist()
print(len(weibo_comments))

for i in range(len(weibo_comments)):
    url_base_1 = "https://weibo.cn/"
    url_base_2 = "/info"
    url = url_base_1 + str(weibo_comments[i][0]) + url_base_2
    print(i)
    print(url)
    html = requests.get(url, headers=header, cookies=Cookie)
    html.encoding='utf-8'  
    nickname = re.findall(r'<div class="c">昵称:(.*?)<br/>', html.text)
    sex = re.findall(r'<br/>性别:(.*?)<br/>', html.text)
    location = re.findall(r'<br/>地区:(.*?)<br/>', html.text)
    birthday = re.findall(r'<br/>生日:(.*?)<br/>', html.text)
    if birthday == []:
        data1 = [(nickname[0], sex[0], location[0], ' ')]
    else:
        data1 = [(nickname[0], sex[0], location[0], birthday[0])]
    data2 = pd.DataFrame(data1)
    print(data2)
    print(type(data2))
    data2.to_csv('id_2011.csv')
    time.sleep(1)

  • 写回答

2条回答 默认 最新

  • 桔子code 2021-04-25 22:02
    关注
        nickname = re.findall(r'<div class="c">昵称:(.*?)<br/>', html.text)
        sex = re.findall(r'<br/>性别:(.*?)<br/>', html.text)
        location = re.findall(r'<br/>地区:(.*?)<br/>', html.text)
        birthday = re.findall(r'<br/>生日:(.*?)<br/>', html.text)

    这4个变量用findall()赋值,都有可能得到的数值为空,这样你后面的nickname[0], sex[0], location[0] 就会导致异常:list index out of range。解决办法是这些变量在用下标索引的时候需要检查是否为空  。if nickname and sex and location :do something

    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论
查看更多回答(1条)

报告相同问题?

悬赏问题

  • ¥15 求daily translation(DT)偏差订正方法的代码
  • ¥15 js调用html页面需要隐藏某个按钮
  • ¥15 ads仿真结果在圆图上是怎么读数的
  • ¥20 Cotex M3的调试和程序执行方式是什么样的?
  • ¥20 java项目连接sqlserver时报ssl相关错误
  • ¥15 一道python难题3
  • ¥15 牛顿斯科特系数表表示
  • ¥15 arduino 步进电机
  • ¥20 程序进入HardFault_Handler
  • ¥15 关于#python#的问题:自动化测试