闲愁。 2020-04-03 01:31 采纳率: 0%
浏览 117

本来可以运行,改了数字不打印结果了,又改回来还是不打印结果了也不报错,懵了,哪位大哥看一眼谢谢啦

import requests
from lxml import etree
import csv

headers = {
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; WOW64) "
"AppleWebKit/537.36"
" (KHTML, like Gecko) "
"Chrome/65.0.3325.181 Safari/537.36"
}
proxy = {'http': '58.220.95.90'}

def get_detail_url(url):
response = requests.get(url, headers=headers, proxies=proxy)
text = response.text
html = etree.HTML(text)
detail_urls = html.xpath('//div[@class="job-content"]//h3//a/@href')
return detail_urls
# print(detail_urls)

def parse_detail_url(url):
carrers = {}
response = requests.get(url, headers=headers, proxies=proxy)
text = response.text # text = response.content.decode('gbk')
html = etree.HTML(text)
title = html.xpath('//div[@class="title-info"]//h1/text()')[0]
carrers['职位'] = title
company = html.xpath("//div[@class='title-info']//h3/a/text()")[0]
carrers['用人公司'] = company
salary = html.xpath('//div[@class="job-title-left"]//p[@class="job-item-title"]/text()')[0]
carrers['工资'] = salary
address = html.xpath('//div[@class="job-title-left"]//p[@class="basic-infor"]/text()')[0]
carrers['地址'] = address
describe = html.xpath("//div[@class='content content-word']/text()")[0]
carrers['描述'] = describe
return carrers

def spider():
base_url = 'https://www.liepin.com/zhaopin/?init=' \
'-1&headckid=8abc72d8e99a221e&dqs=&' \
'fromSearchBtn=2&imscid=R000000035&ckid=' \
'3b940b77623f2371&degradeFlag=0&key=' \
'Python&siTag=p_XzVCa5J0EfySMbVjghcw' \
'~fA9rXquZc5IkJpXC-Ycixw&d_sfrom=search_unknown&d_' \
'ckId=d19babb6ec73fe05edac4894af70bb1e&d_' \
'curPage=1&d_pageSize=40&d_headId=ebc370a67b2c5d86e' \
'72b13b82b6e90c7&curPage={}'
zhaopin = []
for x in range(1,10):
url=base_url.format(x)
# print(url)
detail_urls = get_detail_url(url)
for detail_url in detail_urls:
carrers = parse_detail_url(detail_url)
zhaopin.append(carrers)
print(zhaopin)
filednames=['职位','用人公司','工资','地址','描述']
with open('result.csv','w',encoding='utf-8') as f:
writer=csv.DictWriter(f,filednames)
writer.writeheader()
writer.writerows(zhaopin)

for x in range(1,10): 就这个位置之前是9可以打印改了10不打印了,又改回9还是不打印结果

if name == '__main__':
spider()

  • 写回答

1条回答 默认 最新

  • 德玛洗牙 2020-04-03 09:55
    关注

    首先不太理解你这个for语句的具体意义是什么,是为了调10次爬虫?
    我把程序拷下来执行了,改了一处把name改为了__name__,是可以运行的,无论是9还是10都是可以运行的

    评论

报告相同问题?

悬赏问题

  • ¥15 单纯型python实现编译报错
  • ¥15 c++2013读写oracle
  • ¥15 c++ gmssl sm2验签demo
  • ¥15 关于模的完全剩余系(关键词-数学方法)
  • ¥15 有没有人懂这个博图程序怎么写,还要跟SFB连接,真的不会,求帮助
  • ¥15 PVE8.2.7无法成功使用a5000的vGPU,什么原因
  • ¥15 is not in the mmseg::model registry。报错,模型注册表找不到自定义模块。
  • ¥15 安装quartus II18.1时弹出此error,怎么解决?
  • ¥15 keil官网下载psn序列号在哪
  • ¥15 想用adb命令做一个通话软件,播放录音