练习要求:怎么通过映射的方式来获取其中的服务评分以及电话号码,并能够进行完整的结果打印
提问:svg坐标与css坐标是相反数,为什么寻找svg坐标还要进行计算)
问题:
1.无法打印出需要的资料
import re
import requests
from parsel import Selector
url = 'http://www.porters.vip/confusion/food.html'
svg_url = 'http://www.porters.vip/confusion/font/food.svg'
css_url = 'http://www.porters.vip/confusion/css/food.css'
headers = {
'User-Agent':'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/72.0.3626.121 Safari/537.36',
'Host': 'www.porters.vip'
}
res = requests.get(url,headers=headers)
svg_text = requests.get(svg_url).text
css_text = requests.get(css_url).text
# 下载文本
# with open('柳州.text','w',encoding='utf-8') as f:
# f.write(res.content.decode())
# with open('柳州_svg.svg','w',encoding='utf-8') as f:
# f.write(svg_res.content.decode())
# with open('柳州_css.css','w',encoding='utf-8') as f:
# f.write(css_res.content.decode())
# 查找所对应的css数据
css_class_name = 'vhkqsc'
pile = '.%s(background: -(\d+)px -(\d+)px;)%css_class_name
pattern = re.compile(pile) # 先编译
css = css_text.replace('\n','').replace(' ','')
print(css)
coord = pattern.findall(css)
print(coord) #???
if coord:
x,y = coord[0]
x,y = int(x),int(y)
print(x,y) #???
# 因为svg有四个text,需要寻找对应css标签
svg_data = Selector(svg_text)
texts = svg_data.xpath("//text")
print(texts)
# 根据y值来确定,css的位置 --》 svg对应的值,取最近的一个值
axiy = [i.attrib.get('y') for i in texts if y<=int(i.attrib.get('y'))][0]
print(axiy)
# 提取对应的y的text
svg_text_ = svg_data.xpath("//span[@y=%s]/text()"%axiy).extract_first
print(svg_text_) #???
# 提取字体大小
fout_size = re.search('font-size:(\d+)px;',svg_text).group(1)
print(fout_size) #???
# css对应的坐标/字体大小=svg坐标
position = x//int(fout_size)
number = svg_text_[position]
print(number) #???
print(x,y) 返回为[]
1.确定url
确定svg与css的url
2.发送请求
3.进行css分析每个映射数字的坐标
4.由css对照svg中的text进行查找,并带入字符大小进行计算svg的坐标
5.使用正则,将svg中对应的值添加进,获取完整的信息,并保存