BS无法获取带有span标签的数据

In the below code I can get all of the data from the scrape apart from the "going Allowance" out of the resultsBlockFooter.In the source most of the data is in a List(li) but the going allowance is surrounded by span.I have tried different variations but just cant sem to extract it.Any suggestions appreciated.

     import csv
 from bs4 import BeautifulSoup
 import requests



 html = requests.get("http://www.sportinglife.com=156432).text

soup = BeautifulSoup(html,'lxml')

rows = []
for header in soup.find_all("div", class_="resultsBlockHeader"):
    track = header.find("div",        class_="track").get_text(strip=True).encode('ascii', 'ignore').strip("|")
    date = header.find("div",   class_="date").get_text(strip=True).encode('ascii', 'ignore').strip("|")
    datetime = header.find("div", class_="datetime").get_text(strip=True).encode('ascii', 'ignore').strip("|")
    grade = header.find("div", class_="grade").get_text(strip=True).encode('ascii', 'ignore').strip("|")
    distance = header.find("div", class_="distance").get_text(strip=True).encode('ascii', 'ignore').strip("|")
prizes = header.find("div", class_="prizes").get_text(strip=True).encode('ascii', 'ignore').strip("|")

results = header.find_next_sibling("div",  class_="resultsBlock").find_all("ul", class_="line1")
details = []
for result in results:
    fin = result.find("li", class_="fin").get_text(strip=True)
    greyhound = result.find("li", class_="greyhound").get_text(strip=True)
    trap = result.find("li", class_="trap").get_text(strip=True)
    sp = result.find("li", class_="sp").get_text(strip=True)
    timeSec = result.find("li", class_="timeSec").get_text(strip=True)
    timeDistance = result.find("li", class_="timeDistance").get_text(strip=True)

    details.append({"greyhound": greyhound, "sp": sp, "fin": fin, "timeSec": timeSec, "timeDistance": timeDistance, "trap": trap })


results = header.find_next_sibling("div",  class_="resultsBlock").find_all("ul", class_="line2")
for index, result in enumerate(results):
    trainer = result.find("li",  class_="trainer").get_text(strip=True)
    details[index]["trainer"] = trainer

results = header.find_next_sibling("div",  class_="resultsBlock").find_all("ul", class_="line3")
for index, result in enumerate(results):
    comment = result.find("li",  class_="comment").get_text(strip=True)
    details[index]["comment"] = comment

results = header.find_next_sibling("div",  class_="resultsBlock").find_all("ul", class_="line2")
for index, result in enumerate(results):
    firstessential = result.find("li",  class_="first essential").get_text(strip=True)
    details[index]["first essential"] = firstessential

results = header.find_next_sibling("div",  class_="resultsBlockFooter").find_all("ul", class_="line3")
for index, result in enumerate(results):
   goingAllowance = result.find("div",  class_="Going Allowance").get_text(strip=True)
   details[index]["Going Allowance"] = goingAllowance

for detail in details:
    detail.update({"track": track, "date": date, "datetime": datetime, "grade": grade, "prizes": prizes})
    rows.append(detail)
with open("abc.csv","a") as f:
    writer = csv.DictWriter(f,           [track","date","trap","fin","greyhound","datetime","sp","grade","distance","    prizes","timeSec","timeDistance","trainer","comment","first  essential","going Allowance"])

    for row in rows:
      writer.writerow(row)

写回答
好问题 0 提建议
追加酬金
关注问题
分享
邀请回答
编辑收藏删除结题
收藏举报

1条回答默认最新

关注

码龄粉丝数原力等级 --

被采纳

被点赞

采纳率
douquanjie9326 2016-03-29 17:02
关注
For future reference instead of posting all your code, just include the relevant parts. Also include the html, or section of the website you having trouble capturing. I looked at the website and I think you mean?

test = soup.find("div", {"class":"resultsBlockFooter"}) '<div class="resultsBlockFooter"> <div><span>Going Allowance:</span> -10</div> <div><span>Forecast:</span> (3-4) £20.36 | <span>Tricast:</span> (3-4-2) £61.61</div> </div>'

And you want the <div><span>Going Allowance:</span> -10</div>?

allowance = test.content[1].text #.content can be a helpful list of the tags "Going Allowance: -10" forecast, tricast = test.content[3].text.split("|") #the rest of useful text
解决无用
评论打赏
分享
举报

评论

按下Enter换行，Ctrl+Enter发表内容

报告相同问题？

关注问题

如何同时提取多个同种div下的第某个span标签(语言-python) css3 html python 有问必答
2021-12-26 11:47

回答 2 已采纳先获取所有class='hd'的div保存到列表中，然后遍历列表中每一项获取这一项div的第二个span targets2 = soup.find_all("div", class_="hd") fo
相同标签，用bs如何分别获取内容 python 有问必答
2022-02-14 16:42

回答 3 已采纳使用select获取标签的文本列表。如res=[x.text for x in soup.select('tr td a')]
python爬虫bs4中用select如何获取属性值 python
2021-12-26 12:33

回答 2 已采纳 ```python from bs4 import BeautifulSoup import re html = """ <html><head><title>
python span标签的text属性,如何从BeautifulSoup中的span标签获取文本
2020-12-05 10:52

13661398245的博客 I have links looks like this1 GB I'm trying to get 1 GB from ... I triedtt = [a['title'] for a in soup.select(".systemRequirementsRamContent span")]for ram in tt:if "RAM" in ram.split():print (sou...
想要用request和bs4获取a标签中的title，但是报错。急！！！ python
2020-10-13 07:33

回答 1 已采纳你这个url不对，因为这个网页的数据是异步传输的，你要去XHR中找，然后拿那个包含你这标黄内容的XHR的表头里的Request URL作为输入。
BeautifulSoup4获取select标签的当前选项 python 爬虫
2022-06-29 19:23

回答 1 已采纳 select标签中的option元素是页面加载时就被选中的吗如果是页面加载之后才被选中的,需要在option元素被选中之后再获取wb.page_source并执行soup = bs(wb.page_s
爬取数据html页面时标签之间的内容没有了 python 有问必答
2021-05-17 16:12

回答 5 已采纳可能页面是异步加载的，要F12分析页面数据加载的链接参数传递，对数据进行post或get,参考https://blog.csdn.net/qq_38396897/article/details/823
一篇文带你从0到1了解建站及完成CMS系统编写
2020-10-24 00:48

1_bit的博客了解动态生成的前端页如何绑定自定义数据开发环境 Windows7 *64 SP1 php5.6 apache/nginx thinkphp5.1 mysql phpstudy2018 sqlyog layoutit 声明文章为从0到1了解内容管理系统搭建与编写，由于一篇文章内容篇幅...
爬虫csv获取不到全部数据 python selenium 爬虫
2022-12-24 13:06

回答 2 已采纳 #这样可以拿到所有商品 for i in res.find_all('span',{'class':{'text'}}): d=i.text lst.append(d)
如何将数据添加到模态PHP，MySQL Ajax ajax javascript jquery mysql php
2017-08-12 18:16

回答 1 已采纳 echo $data; where $data is an array will output only the text "Array" as described by the manual:
豆瓣爬虫利用requests获取数据，bs解析数据，目前打印结果为空，布置问题出在哪里？ python 有问必答
2022-04-07 09:45

回答 4 已采纳缺少请求头user-agent，被反扒了没返回内容，要加上 import requests import bs4 url = 'https://movie.douban.com/top250?star
第7课： bs4 库的 BeautifulSoup 基础学习
2020-11-20 23:21

宋哈哈呀的博客这里写目录标题BeautifulSoup 的使用：利用bs4 的 BeautifulSoup 抓取和赛选信息？bs4 的 Beautiful Soup 详细解释： BeautifulSoup 的使用：利用bs4 的 BeautifulSoup 抓取和赛选信息？当我们成功得到网站的反馈...
php数据显示在表格中 php sql
2018-05-12 20:35

回答 3 已采纳 You are not concatenating your $o html code, use $o .= (notice the . (dot) used for concatenating
beautifulsoup去除标签_关于python：如何使用BeautifulSoup从html清除标签
2020-12-20 12:29

weixin_39625987的博客我正在尝试使用NLTK库训练数据。我遵循一个逐步的过程。我做了第一步，但是在做第二步时，出现以下错误：TypeError: a bytes-like object is required, not 'list'我已尽力纠正了它，但又遇到了同样的错误。这是我...
Python数据分析高薪实战第四天构建国产电视剧评分数据集
2022-05-04 09:36

办公模板库素材蛙的博客 10 实战：手把手教你构建国产电视剧评分数据集在前面几讲，我们已经学习完了爬虫技术的三个基础环节：下载数据、提取数据以及保存数据。今天我们将通过一个综合的实战案例来将之前的内容都串联起来，帮你加深印象...
没有解决我的问题, 去提问

悬赏问题

¥15 孟德尔随机化结果不一致
¥15 apm2.8飞控罗盘bad health，加速度计校准失败
¥15 求解O-S方程的特征值问题给出边界层布拉休斯平行流的中性曲线
¥15 谁有desed数据集呀
¥20 手写数字识别运行c仿真时，程序报错错误代码sim211-100
¥15 关于#hadoop#的问题
¥15 (标签-Python|关键词-socket)
¥15 keil里为什么main.c定义的函数在it.c调用不了
¥50 切换TabTip键盘的输入法
¥15 可否在不同线程中调用封装数据库操作的类

BS无法获取带有span标签的数据

1条回答 默认 最新

悬赏问题

1条回答默认最新