dshm8998473 2016-03-29 09:35
浏览 122

BS无法获取带有span标签的数据

In the below code I can get all of the data from the scrape apart from the "going Allowance" out of the resultsBlockFooter.In the source most of the data is in a List(li) but the going allowance is surrounded by span.I have tried different variations but just cant sem to extract it.Any suggestions appreciated.

     import csv
 from bs4 import BeautifulSoup
 import requests



 html = requests.get("http://www.sportinglife.com=156432).text

soup = BeautifulSoup(html,'lxml')

rows = []
for header in soup.find_all("div", class_="resultsBlockHeader"):
    track = header.find("div",        class_="track").get_text(strip=True).encode('ascii', 'ignore').strip("|")
    date = header.find("div",   class_="date").get_text(strip=True).encode('ascii', 'ignore').strip("|")
    datetime = header.find("div", class_="datetime").get_text(strip=True).encode('ascii', 'ignore').strip("|")
    grade = header.find("div", class_="grade").get_text(strip=True).encode('ascii', 'ignore').strip("|")
    distance = header.find("div", class_="distance").get_text(strip=True).encode('ascii', 'ignore').strip("|")
prizes = header.find("div", class_="prizes").get_text(strip=True).encode('ascii', 'ignore').strip("|")

results = header.find_next_sibling("div",  class_="resultsBlock").find_all("ul", class_="line1")
details = []
for result in results:
    fin = result.find("li", class_="fin").get_text(strip=True)
    greyhound = result.find("li", class_="greyhound").get_text(strip=True)
    trap = result.find("li", class_="trap").get_text(strip=True)
    sp = result.find("li", class_="sp").get_text(strip=True)
    timeSec = result.find("li", class_="timeSec").get_text(strip=True)
    timeDistance = result.find("li", class_="timeDistance").get_text(strip=True)

    details.append({"greyhound": greyhound, "sp": sp, "fin": fin, "timeSec": timeSec, "timeDistance": timeDistance, "trap": trap })


results = header.find_next_sibling("div",  class_="resultsBlock").find_all("ul", class_="line2")
for index, result in enumerate(results):
    trainer = result.find("li",  class_="trainer").get_text(strip=True)
    details[index]["trainer"] = trainer

results = header.find_next_sibling("div",  class_="resultsBlock").find_all("ul", class_="line3")
for index, result in enumerate(results):
    comment = result.find("li",  class_="comment").get_text(strip=True)
    details[index]["comment"] = comment

results = header.find_next_sibling("div",  class_="resultsBlock").find_all("ul", class_="line2")
for index, result in enumerate(results):
    firstessential = result.find("li",  class_="first essential").get_text(strip=True)
    details[index]["first essential"] = firstessential

results = header.find_next_sibling("div",  class_="resultsBlockFooter").find_all("ul", class_="line3")
for index, result in enumerate(results):
   goingAllowance = result.find("div",  class_="Going Allowance").get_text(strip=True)
   details[index]["Going Allowance"] = goingAllowance

for detail in details:
    detail.update({"track": track, "date": date, "datetime": datetime, "grade": grade, "prizes": prizes})
    rows.append(detail)
with open("abc.csv","a") as f:
    writer = csv.DictWriter(f,           [track","date","trap","fin","greyhound","datetime","sp","grade","distance","    prizes","timeSec","timeDistance","trainer","comment","first  essential","going Allowance"])

    for row in rows:
      writer.writerow(row)
  • 写回答

1条回答 默认 最新

  • douquanjie9326 2016-03-29 17:02
    关注

    For future reference instead of posting all your code, just include the relevant parts. Also include the html, or section of the website you having trouble capturing. I looked at the website and I think you mean?

    test = soup.find("div", {"class":"resultsBlockFooter"})
    '<div class="resultsBlockFooter">
    <div><span>Going Allowance:</span> -10</div>
    <div><span>Forecast:</span> (3-4) £20.36 | <span>Tricast:</span> (3-4-2) £61.61</div>
    </div>'
    

    And you want the <div><span>Going Allowance:</span> -10</div>?

    allowance = test.content[1].text #.content can be a helpful list of the tags
    "Going Allowance: -10"
    forecast, tricast = test.content[3].text.split("|") #the rest of useful text
    
    评论

报告相同问题?

悬赏问题

  • ¥15 孟德尔随机化结果不一致
  • ¥15 apm2.8飞控罗盘bad health,加速度计校准失败
  • ¥15 求解O-S方程的特征值问题给出边界层布拉休斯平行流的中性曲线
  • ¥15 谁有desed数据集呀
  • ¥20 手写数字识别运行c仿真时,程序报错错误代码sim211-100
  • ¥15 关于#hadoop#的问题
  • ¥15 (标签-Python|关键词-socket)
  • ¥15 keil里为什么main.c定义的函数在it.c调用不了
  • ¥50 切换TabTip键盘的输入法
  • ¥15 可否在不同线程中调用封装数据库操作的类