weixin_33726318 2018-05-29 07:35 采纳率: 0%
浏览 48

在Python中抓取网址

I'm trying to get the adidas shoe link from a search page, can't figure it out what I'm doing wrong.

I tried tags = soup.find("section", {"class": "productList"}).findAll("a") Doesnt work :(

I also tried to print all href and the desired link is not in there :(

So I'm expecting to print this :

https://www.tennisexpress.com/adidas-mens-adizero-ubersonic-50-yrs-ltd-tennis-shoes-off-white-and-signal-blue-62138


from bs4 import BeautifulSoup
import requests

url = "https://www.tennisexpress.com/search.cfm?searchKeyword=BB6892"

# Getting the webpage, creating a Response object.
response = requests.get(url)

# Extracting the source code of the page.
data = response.text

# Passing the source code to BeautifulSoup to create a BeautifulSoup object for it.
soup = BeautifulSoup(data, 'lxml')

# Extracting all the <a> tags into a list.
tags = soup.find("section", {"class": "productList"}).findAll("a")

# Extracting URLs from the attribute href in the <a> tags.
for tag in tags:
    print(tag.get('href'))

Here's the html code for that link

<section class="productList"> <article class="productListing"> <a class="product" href="//www.tennisexpress.com/adidas-mens-adizero-ubersonic-50-yrs-ltd-tennis-shoes-off-white-and-signal-blue-62138" title="Men`s Adizero Ubersonic 50 Yrs LTD Tennis Shoes Off White and Signal Blue" onmousedown="return nxt_repo.product_x('38698770','1');"> <span class="sale">SALE</span> <span class="image"> <img src="//www.tennisexpress.com/prodimages/78091-DEFAULT-m.jpg" alt="Men`s Adizero Ubersonic 50 Yrs LTD Tennis Shoes Off White and Signal Blue"> </span> <span class="brand"> Adidas </span> <span class="name"> Men`s Adizero Ubersonic 50 Yrs LTD Tennis Shoes Off White and Signal Blue </span> <span class="pricing"> <strong class="listPrice">$140.00</strong> <strong class="percentOff">0% OFF</strong> <strong class="salePrice">$139.95</strong> </span> <br> </a> </article> </section>
  • 写回答

3条回答 默认 最新

  • weixin_33719619 2018-05-29 07:38
    关注
    soup = BeautifulSoup(data, "html.parser")    
    markup = soup.find_all("section", class_=["productList"])
    markupContent = markup.get_text()
    

    So your code goes like

    import urllib
    from bs4 import BeautifulSoup
    import requests
    
    url = "https://www.tennisexpress.com/search.cfm?searchKeyword=BB6892"
    
    r = urllib.urlopen(url).read()
    soup = BeautifulSoup(r, "html.parser")
    productMarkup = soup.find_all("section", class_=["productList"])
    product = productMarkup.get_text()
    
    评论

报告相同问题?

悬赏问题

  • ¥15 metadata提取的PDF元数据,如何转换为一个Excel
  • ¥15 关于arduino编程toCharArray()函数的使用
  • ¥100 vc++混合CEF采用CLR方式编译报错
  • ¥15 coze 的插件输入飞书多维表格 app_token 后一直显示错误,如何解决?
  • ¥15 vite+vue3+plyr播放本地public文件夹下视频无法加载
  • ¥15 c#逐行读取txt文本,但是每一行里面数据之间空格数量不同
  • ¥50 如何openEuler 22.03上安装配置drbd
  • ¥20 ING91680C BLE5.3 芯片怎么实现串口收发数据
  • ¥15 无线连接树莓派,无法执行update,如何解决?(相关搜索:软件下载)
  • ¥15 Windows11, backspace, enter, space键失灵