我是用python+xpath进行网络爬虫获取51job.com的信息,然后要爬取5页,我单页爬取是可以的,但是加上在网上搜的网页循环后就不行了,求大佬们帮助,后天就得交作业了,十万火急!!!谢谢~~
#1)这段是可以单独运行成功的
import requests
from lxml import etree
html = etree.HTML(r.content, etree.HTMLParser(encoding='GBK'))
for i in range(1, 5):
url = 'https://search.51job.com/list/030200,000000,0000,00,1,99,%25E6%2595%25B0%25E6%258D%25AE%25E5%2588%2586%25E6%259E%2590,2,[i].html?lang=c&postchannel=0000&workyear=99&cotype=99°reefrom=99&jobterm=99&companysize=99&ord_field=0&dibiaoid=0&line=&welfare='
rq = requests.get(url)
html = rq.text
#2)这段也是单页爬取是可以运行成功,但是一起运行就不可以
#import requests #里面表示就是一页爬取信息
#url = 'https://search.51job.com/list/030200,000000,0000,00,1,99,%25E6%2595%25B0%25E6%258D%25AE%25E5%2588%2586%25E6%259E%2590,2,1.html?lang=c&postchannel=0000&workyear=99&cotype=99°reefrom=99&jobterm=99&companysize=99&ord_field=0&dibiaoid=0&line=&welfare='
#r = requests.get(url)
#r.encoding = 'GBK'
#print (r.text)
#职位名
position= [html.xpath('normalize-space(//*[@id="resultList"]/div['+str(p)+']/p/span/a/text())') for p in range(4,54)]
#详情链接
links = [html.xpath('//*[@id="resultList"]/div['+str(p)+']/p/span/a/@href/text()') for p in range(4,54)]
#公司名
company= [html.xpath('//*[@id="resultList"]/div['+str(p)+']/span[1]/a/text()') for p in range(4,54)]
#工作地点
adress= [html.xpath('//*[@id="resultList"]/div['+str(p)+']/span[2]/text()') for p in range(4,54)]
#+str(i)+
#薪资
wage= [html.xpath('//*[@id="resultList"]/div['+str(p)+']/span[3]/text()') for p in range(4,54)]
#发布时间
time= [html.xpath('//*[@id="resultList"]/div['+str(p)+']/span[4]/text()') for p in range(4,54)]
链接也是可以运行,但是打印出来是空白的
在线急!!!