python爬虫抓取机票时出现的问题

我是在校学生,自学了点python,想用爬虫抓取机票价格可以更方便的了解特价票信息,所以在网上找了抓取的一些代码然后自己又改了一些,初步有自己想要的功能:挂在服务器上运行,一旦有特价票,向我的邮箱发信息。但是一直有问题,第一个是运行的时候会出下面这个错误(好像是列表越界):
Exception in thread Thread-24:
Traceback (most recent call last):
File "/usr/local/python27/lib/python2.7/threading.py", line 551, in __bootstrap_inner
self.run()
File "/usr/local/python27/lib/python2.7/threading.py", line 755, in run
self.function(*self.args, **self.kwargs)
File "SpecialFlightPrice.py", line 72, in task_query_flight
flights=getdate(city, today, enddate)
File "SpecialFlightPrice.py", line 27, in getdate
json_data = re.findall(pattern, price_html)[0]
IndexError: list index out of range

还有一个问题就是我想每天定时对机票信息文件进行清空,但是写的代码却实现不了这个功能,请大神顺便帮我改一改。
先感谢!

下面是源码(我把个人的2个邮箱改成了xxxxxxxx,如果想运行需要把xxxxxxxx改成自己的2个邮箱,还有因为是挂在服务器上运行的,所以需要输入几个参数:出发地点,日期,日期后几天的机票):

-*- coding: utf-8 -*-

import datetime
import time
import json
import urllib
import re
import sys
import threading
from email.mime.text import MIMEText
import smtplib
from time import sleep
from threading import Timer
from _ast import While

default_encoding = 'utf-8'
reload(sys)
sys.setdefaultencoding(default_encoding)

def getdate(city,startdate,enddate):
url = 'https://sjipiao.alitrip.com/search/cheapFlight.htm?startDate=%s&endDate=%s&' \
'routes=%s-&_ksTS=1469412627640_2361&callback=jsonp2362&ruleId=99&flag=1' % (startdate, enddate,city)
price_html = urllib.urlopen(url).read().strip()

pattern = r'jsonp2362\(\s+(.+?)\)'
re_rule = re.compile(pattern)

json_data = re.findall(pattern, price_html)[0]
price_json = json.loads(json_data)

flights = price_json['data']['flights']  # flights Info


return flights

def sendmail(a,b,c,d):
_user = "xxxxxxxxxxx@163.com"
_pwd = "xxxxxxxxxxx"
_to = "xxxxxxxxxxxxx@qq.com"
msg = MIMEText('%s%s%s%s'%(a,b,c,d),'plain','utf-8')
msg["Subject"] = "有特价票啦~"
msg["From"] = _user
msg["To"] = _to
try:
s = smtplib.SMTP_SSL("smtp.163.com", 465)
s.login(_user, _pwd)
s.sendmail(_user, _to, msg.as_string())
s.quit()
print "Success!"

except smtplib.SMTPException:
print "Falied"

def task_query_flight():
city=str(sys.argv[1])
year=int(sys.argv[2])
month=int(sys.argv[3])
day=int(sys.argv[4])
delay=int(sys.argv[5])

if city=='DL':
city='DLC'
elif city=='NJ':
city='NKG'
elif city=='BJ':
city='BJS'
today = datetime.date(year,month,day)
enddate = today + datetime.timedelta(delay)
print'从%s到%s的最便宜的机票价格是' % (today,enddate)

flights=getdate(city, today, enddate)


for f in flights:
    if f['discount'] <=2  :
        source = '从:%s-' % f['depName']
        dest = '到:%s\t' % f['arrName']
        price = '\t价格:%s%s(折扣:%s)\t' % ((f['price']), f['priceDesc'], f['discount'])
        depart_date = '\t日期:%s' % f['depDate']
        print source+dest+price+depart_date

        with open('store.txt','a') as f:
            f.write(' ')

        with open('store.txt','r') as f:
            for line in f.readlines():
                if '%s%s%s%s'%(source,dest,price,depart_date) in line:
                    Timer(60,task_query_flight).start()
                else:
                    sendmail(source, dest, price, depart_date)
                    with open('store.txt', 'a') as f:
                        f.write('%s%s%s%s'%(source,dest,price,depart_date))
                    Timer(60,task_query_flight).start() 

'''
两个问题:
1、列表越界  list out of range
2、定时器只会运行一次  不知什么原因。




if 没找到discount<2的,
   则  循环一直找
  并且设定时器到某一时间即清空文件内容
'''          

while True:
task_query_flight()
current_time = time.localtime(time.time())
if((current_time.tm_hour == 7) and (current_time.tm_min == 0)):
with open('store1.txt','w') as f:
f.truncate()
time.sleep(60)

if name == '__main__':
task_query_flight()

1个回答

weixin_37149578
weixin_37149578 这是因为反爬虫的原因吗?提示列表越界是这个原因吗
3 年多之前 回复
Csdn user default icon
上传中...
上传图片
插入图片
抄袭、复制答案,以达到刷声望分或其他目的的行为,在CSDN问答是严格禁止的,一经发现立刻封号。是时候展现真正的技术了!
立即提问