"""
http://www.ccgp-hunan.gov.cn/page/notice/more.jsp
https://hunan.zcygov.cn/luban/announcement/list?utm=a0017.b0064.3.5.f7fcb4c03c7411ed84984b6678c33275
需求
1.招投标网站爬虫软件
2.需要有软件界面
3.可以选择下载保存公告文件
4.文件格式pdf
"""
import csv
import os
import PyPDF2
import requests
from lxml import etree
import json
from bs4 import BeautifulSoup
from pprint import pprint
url = 'http://www.ccgp-hunan.gov.cn/mvc/getNoticeList4Web.do'
headers = {
'Accept': 'application/json, text/javascript, */*; q=0.01',
'Accept-Language': 'zh-CN,zh;q=0.9',
'Connection': 'keep-alive',
'Content-Type': 'application/x-www-form-urlencoded; charset=UTF-8',
# 'Cookie': 'JSESSIONID=BD97B12D61360D93BEC5912F62B0F8BC',
'Origin': 'http://www.ccgp-hunan.gov.cn',
'Referer': 'http://www.ccgp-hunan.gov.cn/page/notice/more.jsp',
'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/111.0.0.0 Safari/537.36',
'X-Requested-With': 'XMLHttpRequest',
}
data = {
'pType': '',
'prcmPrjName': '',
'prcmItemCode': '',
'prcmOrgName': '',
'startDate': '2023-01-01',
'endDate': '2023-03-11',
'prcmPlanNo': '',
'page': '1',
'pageSize': '18',
}
cookies = {
'JSESSIONID': 'BD97B12D61360D93BEC5912F62B0F8BC',
}
resp = requests.post(url,headers=headers,data=data,cookies=cookies)
r = resp.text
response = json.loads(resp.text)
lis = response["rows"]
# soup = BeautifulSoup(r,'lxml')
# list = soup.find_all('tr')
print(lis)
data = []
for i in lis:
# item = {}
# # print(i)
# item['名字'] = i['ORG_NAME']
# item['公告'] = i['NOTICE_TITLE']
# item['时间'] = i['NEWWORK_DATE']
# # item['链接'] = i['href']
# data.append(item)
title = i['ORG_NAME']
announcement = i['NOTICE_TITLE']
time = i['NEWWORK_DATE']
# with open('kaohe.csv','w',encoding='utf-8-sig',newline='') as f:
# # writer = csv.DictWriter(f,fieldnames=['名字','公告','时间'])
# # writer.writeheader()
# # writer.writerows(data)
我想把采集到的数据保存到pdf中该怎么做
- 写回答
- 好问题 0 提建议
- 追加酬金
- 关注问题
- 邀请回答
-
2条回答 默认 最新
- 嗷呜大嘴狼 2023-03-12 16:25关注
安装PyPDF2库,可以在终端中使用以下命令:
pip install pypdf2
在代码中添加以下内容,将每条数据保存为一个PDF文件:
# 创建PDF文件对象 pdf = PyPDF2.PdfFileWriter() # 遍历数据列表 for i in lis: # 获取数据项 title = i['ORG_NAME'] announcement = i['NOTICE_TITLE'] time = i['NEWWORK_DATE'] # 创建PDF页面对象 page = PyPDF2.pdf.PageObject.createBlankPage(None, 72*11, 72*8.5) # 在页面上添加数据 page.mergePage(PyPDF2.pdf.PageObject.createTextObject(None, title)) page.mergePage(PyPDF2.pdf.PageObject.createTextObject(None, announcement)) page.mergePage(PyPDF2.pdf.PageObject.createTextObject(None, time)) # 将页面添加到PDF文件中 pdf.addPage(page) # 保存PDF文件 with open('data.pdf', 'wb') as f: pdf.write(f)
本回答被题主选为最佳回答 , 对您是否有帮助呢?解决 无用评论 打赏 举报
悬赏问题
- ¥15 如何获取APP内弹出的网址链接
- ¥15 wifi 图标不见了 不知道怎么办 上不了网 变成小地球了
- ¥50 STM32单片机传感器读取错误
- ¥50 power BI 从Mysql服务器导入数据,但连接进去后显示表无数据
- ¥15 (关键词-阻抗匹配,HFSS,RFID标签)
- ¥50 sft下载大文阻塞卡死
- ¥15 机器人轨迹规划相关问题
- ¥15 word样式右侧翻页键消失
- ¥15 springboot+vue 集成keycloak sso到阿里云
- ¥15 win7系统进入桌面过一秒后突然黑屏