Blueberryjam7
2021-11-05 22:54
采纳率: 100%
浏览 35
已结题

在用python进行上交所爬虫时遇到这样的问题,请问需要怎么解决

以下是代码:

import json
import requests
import re
import datetime
import csv 
import time
f=open('C:\\Users\\liu\\Desktop\\python\\年报爬取\\连续天数'+'stkcd.csv',mode='w',encoding='gbk',newline='')
writer=csv.writer(f)
head=['stkcd']
writer.writerow(head)
begin=datetime.date(2021,4,1)
end=datetime.date(2021,4,30)
for i in range((end-begin).days+1):
    time.sleep(1)
    searchDate=str(begin + datetime.timedelta(days=i))
    responsel=requests.get(
        'http://query.sse.com.cn/commonQuery.do?jsonCallBack=jsonpCallback87383849&isPagination=true&pageHelp.pageSize=25&pageHelp.cacheSize=1&type=inParams&sqlId=COMMON_PL_SSGSXX_ZXGG_L&START_DATE=2021-04-01&END_DATE=2021-04-30&SECURITY_CODE=&TITLE=%E5%B9%B4%E6%8A%A5&BULLETIN_TYPE=0101&pageHelp.pageNo=1&pageHelp.beginPage=1&pageHelp.endPage=1&_=1635924801654'
        ,
        headers={'Referer':'http://www.sse.com.cn/disclosure/listedinfo/announcement/'}
    )
    json_str1 = responsel.text[19:-1]
    data1=json.loads(json_str1)
    max_page=data1['pageHelp']['pageCount']+1
    for j in range(1,max_page):
        response=requests.get(
            'http://query.sse.com.cn/commonQuery.do?jsonCallBack=jsonpCallback87383849&isPagination=true&pageHelp.pageSize=25&pageHelp.cacheSize=1&type=inParams&sqlId=COMMON_PL_SSGSXX_ZXGG_L&START_DATE=2021-04-01&END_DATE=2021-04-30&SECURITY_CODE=&TITLE=%E5%B9%B4%E6%8A%A5&BULLETIN_TYPE=0101&pageHelp.pageNo=1&pageHelp.beginPage=1&pageHelp.endPage=1&_=1635924801654'
        ,
        headers={'Referer':'http://www.sse.com.cn/disclosure/listedinfo/announcement/'}
        )
        json_str=response.text[19:-1]
        data=json.loads(json_str,strict=False)
        for report in data['result']:
            download_url='http://www.sse.com.cn/'+report['URL']
            if re.search('年度报告',report['TITLE'],re.S):
                if re.search('摘要',report['TITLE'],re.S):
                    pass
                else:
                    filename=report['SECURITY_CODE']+report['TITLE']+searchDate+'.pdf'
                    print(filename)
                    writer.writerow([report['SECURITY_CODE']])
                    if re.search('ST',report['TITLE'],re.S):
                        filename=report['SECURITY_CODE']+'-ST'+searchDate+'.pdf'
                        download_url='http://www.sse.com.cn/'+report['URL']
                        resource=requests.get(download_url,stream=True)
                        with open('C:\\Users\\liu\\Desktop\\python\\年报爬取\\连续天数'+filename,'wb') as fd:
                            for y in resource.iter_content(102400):
                                fd.write(y)
                            print(filename,'完成下载')
                    else: 
                        download_url='http://www.sse.com.cn/'+report['URL']
                        resource=requests.get(download_url,stream=True)
                        with open('C:\\Users\\liu\\Desktop\\python\\年报爬取\\连续天数'+filename,'wb') as fd:
                            for y in resource.iter_content(102400):
                                fd.write(y)
                            print(filename,'完成下载')
f.close()              



以下是运行后报错的内容:

img

  • 写回答
  • 好问题 提建议
  • 追加酬金
  • 关注问题
  • 收藏
  • 邀请回答

2条回答 默认 最新

  • CSDN专家-HGJ 2021-11-05 23:02
    已采纳

    检查代码中第30行json_str,存在json无法解析的数据,参考一下json.loads的数据结构类似于:jsonData = '{"a":1,"b":2,"c":3,"d":4,"e":5}'

    评论
    解决 无用
    打赏 举报
查看更多回答(1条)

相关推荐 更多相似问题