weixin_37149578 2016-12-23 12:20 采纳率: 0%
浏览 2632

python爬虫抓取机票时出现的问题

我是在校学生,自学了点python,想用爬虫抓取机票价格可以更方便的了解特价票信息,所以在网上找了抓取的一些代码然后自己又改了一些,初步有自己想要的功能:挂在服务器上运行,一旦有特价票,向我的邮箱发信息。但是一直有问题,第一个是运行的时候会出下面这个错误(好像是列表越界):
Exception in thread Thread-24:
Traceback (most recent call last):
File "/usr/local/python27/lib/python2.7/threading.py", line 551, in __bootstrap_inner
self.run()
File "/usr/local/python27/lib/python2.7/threading.py", line 755, in run
self.function(*self.args, **self.kwargs)
File "SpecialFlightPrice.py", line 72, in task_query_flight
flights=getdate(city, today, enddate)
File "SpecialFlightPrice.py", line 27, in getdate
json_data = re.findall(pattern, price_html)[0]
IndexError: list index out of range

还有一个问题就是我想每天定时对机票信息文件进行清空,但是写的代码却实现不了这个功能,请大神顺便帮我改一改。
先感谢!

下面是源码(我把个人的2个邮箱改成了xxxxxxxx,如果想运行需要把xxxxxxxx改成自己的2个邮箱,还有因为是挂在服务器上运行的,所以需要输入几个参数:出发地点,日期,日期后几天的机票):

-*- coding: utf-8 -*-

import datetime
import time
import json
import urllib
import re
import sys
import threading
from email.mime.text import MIMEText
import smtplib
from time import sleep
from threading import Timer
from _ast import While

default_encoding = 'utf-8'
reload(sys)
sys.setdefaultencoding(default_encoding)

def getdate(city,startdate,enddate):
url = 'https://sjipiao.alitrip.com/search/cheapFlight.htm?startDate=%s&endDate=%s&' \
'routes=%s-&_ksTS=1469412627640_2361&callback=jsonp2362&ruleId=99&flag=1' % (startdate, enddate,city)
price_html = urllib.urlopen(url).read().strip()

pattern = r'jsonp2362\(\s+(.+?)\)'
re_rule = re.compile(pattern)

json_data = re.findall(pattern, price_html)[0]
price_json = json.loads(json_data)

flights = price_json['data']['flights']  # flights Info


return flights

def sendmail(a,b,c,d):
_user = "xxxxxxxxxxx@163.com"
_pwd = "xxxxxxxxxxx"
_to = "xxxxxxxxxxxxx@qq.com"
msg = MIMEText('%s%s%s%s'%(a,b,c,d),'plain','utf-8')
msg["Subject"] = "有特价票啦~"
msg["From"] = _user
msg["To"] = _to
try:
s = smtplib.SMTP_SSL("smtp.163.com", 465)
s.login(_user, _pwd)
s.sendmail(_user, _to, msg.as_string())
s.quit()
print "Success!"

except smtplib.SMTPException:
print "Falied"

def task_query_flight():
city=str(sys.argv[1])
year=int(sys.argv[2])
month=int(sys.argv[3])
day=int(sys.argv[4])
delay=int(sys.argv[5])

if city=='DL':
city='DLC'
elif city=='NJ':
city='NKG'
elif city=='BJ':
city='BJS'
today = datetime.date(year,month,day)
enddate = today + datetime.timedelta(delay)
print'从%s到%s的最便宜的机票价格是' % (today,enddate)

flights=getdate(city, today, enddate)


for f in flights:
    if f['discount'] <=2  :
        source = '从:%s-' % f['depName']
        dest = '到:%s\t' % f['arrName']
        price = '\t价格:%s%s(折扣:%s)\t' % ((f['price']), f['priceDesc'], f['discount'])
        depart_date = '\t日期:%s' % f['depDate']
        print source+dest+price+depart_date

        with open('store.txt','a') as f:
            f.write(' ')

        with open('store.txt','r') as f:
            for line in f.readlines():
                if '%s%s%s%s'%(source,dest,price,depart_date) in line:
                    Timer(60,task_query_flight).start()
                else:
                    sendmail(source, dest, price, depart_date)
                    with open('store.txt', 'a') as f:
                        f.write('%s%s%s%s'%(source,dest,price,depart_date))
                    Timer(60,task_query_flight).start() 

'''
两个问题:
1、列表越界  list out of range
2、定时器只会运行一次  不知什么原因。




if 没找到discount<2的,
   则  循环一直找
  并且设定时器到某一时间即清空文件内容
'''          

while True:
task_query_flight()
current_time = time.localtime(time.time())
if((current_time.tm_hour == 7) and (current_time.tm_min == 0)):
with open('store1.txt','w') as f:
f.truncate()
time.sleep(60)

if name == '__main__':
task_query_flight()

  • 写回答

1条回答

  • zqbnqsdsmd 2016-12-24 15:49
    关注
    评论

报告相同问题?

悬赏问题

  • ¥15 求差集那个函数有问题,有无佬可以解决
  • ¥15 【提问】基于Invest的水源涵养
  • ¥20 微信网友居然可以通过vx号找到我绑的手机号
  • ¥15 寻一个支付宝扫码远程授权登录的软件助手app
  • ¥15 解riccati方程组
  • ¥15 display:none;样式在嵌套结构中的已设置了display样式的元素上不起作用?
  • ¥15 使用rabbitMQ 消息队列作为url源进行多线程爬取时,总有几个url没有处理的问题。
  • ¥15 Ubuntu在安装序列比对软件STAR时出现报错如何解决
  • ¥50 树莓派安卓APK系统签名
  • ¥65 汇编语言除法溢出问题