在Python中抓取网址

I'm trying to get the adidas shoe link from a search page, can't figure it out what I'm doing wrong.

I tried tags = soup.find("section", {"class": "productList"}).findAll("a") Doesnt work :(

I also tried to print all href and the desired link is not in there :(

So I'm expecting to print this :

https://www.tennisexpress.com/adidas-mens-adizero-ubersonic-50-yrs-ltd-tennis-shoes-off-white-and-signal-blue-62138


from bs4 import BeautifulSoup
import requests

url = "https://www.tennisexpress.com/search.cfm?searchKeyword=BB6892"

# Getting the webpage, creating a Response object.
response = requests.get(url)

# Extracting the source code of the page.
data = response.text

# Passing the source code to BeautifulSoup to create a BeautifulSoup object for it.
soup = BeautifulSoup(data, 'lxml')

# Extracting all the <a> tags into a list.
tags = soup.find("section", {"class": "productList"}).findAll("a")

# Extracting URLs from the attribute href in the <a> tags.
for tag in tags:
    print(tag.get('href'))

Here's the html code for that link

<section class="productList"> <article class="productListing"> <a class="product" href="//www.tennisexpress.com/adidas-mens-adizero-ubersonic-50-yrs-ltd-tennis-shoes-off-white-and-signal-blue-62138" title="Men`s Adizero Ubersonic 50 Yrs LTD Tennis Shoes Off White and Signal Blue" onmousedown="return nxt_repo.product_x('38698770','1');"> <span class="sale">SALE</span> <span class="image"> <img src="//www.tennisexpress.com/prodimages/78091-DEFAULT-m.jpg" alt="Men`s Adizero Ubersonic 50 Yrs LTD Tennis Shoes Off White and Signal Blue"> </span> <span class="brand"> Adidas </span> <span class="name"> Men`s Adizero Ubersonic 50 Yrs LTD Tennis Shoes Off White and Signal Blue </span> <span class="pricing"> <strong class="listPrice">$140.00</strong> <strong class="percentOff">0% OFF</strong> <strong class="salePrice">$139.95</strong> </span> <br> </a> </article> </section>

写回答
好问题 0 提建议
追加酬金
关注问题
分享
邀请回答
编辑收藏删除结题
收藏举报

3条回答默认最新

weixin_33719619 2018-05-29 07:38

关注

soup = BeautifulSoup(data, "html.parser")    
markup = soup.find_all("section", class_=["productList"])
markupContent = markup.get_text()

So your code goes like

import urllib
from bs4 import BeautifulSoup
import requests

url = "https://www.tennisexpress.com/search.cfm?searchKeyword=BB6892"

r = urllib.urlopen(url).read()
soup = BeautifulSoup(r, "html.parser")
productMarkup = soup.find_all("section", class_=["productList"])
product = productMarkup.get_text()

报告相同问题？

关注问题

python如何在canvas中加入图片 python 有问必答
2021-12-16 16:17

回答 2 已采纳这是图片 2.png下面是导入后的效果代码 from tkinter import * master = Tk() #创建Canvas设置宽度500,高度360,背景粉色 canvas = Ca
python如何抓取类型为EventStream的数据 php python 有问必答
2023-02-13 09:35

回答 4 已采纳使用stream参数和iter_content方法 s="" resp=requests.get(url,stream=True) print(resp.headers) for chunk in r
什么是Java python web前端？ java-ee javascript python
2021-03-04 14:23

回答 4 已采纳 https://www.runoob.com/w3cnote/a-beginners-guide-to-web-development.html
python数据抓取
2022-05-24 14:03

小陈步吃人的博客一、页面分析二、网页抓取方法 1、正则表达式方法 2、BeautifulSoup 模块 3、lxml 模块 4、各方法的对比总结三、Xpath选择器四、CSS选择器五、数据抓取总结
flask中如何前端实时同步后端的结果 flask python 前端
2022-01-25 12:52

回答 1 已采纳两种方式：1.websocket2.前端定时 setInterval(() => { // 写入操作，可以写一个http向后端请求数据 }, 1000) // 每1000ms执行一次
python中flag 用法 python
2022-05-10 23:03

回答 1 已采纳大概看了一下功能是要删除一个学生信息，用flag是为了区分两种对立的情况一种是找到了，然后删除，另一种是没找到，后边需要根据前边的操作结果打印处理结果信息，所以需要一个标志变量来记录前边的处理结果，当
在python中from PIL import image报错 python 有问必答
2021-10-05 00:32

回答 5 已采纳参考一下这里的解决办法： from PIL import Image 报错_c_lanxiaofang的博客-CSDN博客选择Instal
Using Django with GAE Python 后台抓取多个网站的页面全文
2020-09-21 18:10

在Python中，Django是一个强大的Web框架，用于构建高效、可扩展的Web应用。结合Google App Engine，我们可以创建一个运行在云平台上的高效爬虫，从而实现对多个网站的页面抓取。以下是实现这一目标的关键步骤： 1. ...
如何在python中触发excel中的vba宏 python
2020-08-12 17:09

回答 2 已采纳第一种方式，参数"all_transport1()"的括号不需要，只提供宏名称就行了 ``` import win32com.client import os.path xlApp =
python中的cfg模块 python
2022-04-28 16:04

回答 4 已采纳 cfg应该是config的缩写吧，这个应该是他们自己写的配置文件。
python中如何将计算数据保存在列表中 list python 有问必答
2021-06-26 13:54

回答 3 已采纳 for i in range(len(B1)): 这行代码，一开始B1列表为空，len(B1)=0，不会进行循环
python爬虫技术对就业网站进行爬取，将爬取到的数据进行可视化显示到前端中
2023-04-12 21:55

Python爬虫技术在当前信息化社会中扮演着至关重要的角色，特别是在数据分析、市场研究以及网络监控等领域。本项目利用Python爬虫技术对就业网站进行数据抓取，旨在收集大量的职位信息，以便进行深入的分析和洞察。...
Python 中for循环如何不换行 python
2022-04-03 09:50

回答 2 已采纳 favorite_languages={ 'Mike':['Java',20], 'Tracy':['C++',21], 'Jack':['Python',19], } for name,lang
python+Oracle+Flask+前端网页=爬虫程序
2020-02-17 17:38

在这个项目中，Python被用来编写爬虫脚本，用于自动化地从网络上抓取图片和视频资源。 **Oracle数据库**：Oracle是企业级的关系型数据库管理系统，提供了高度可靠性和性能。在这个项目中，Oracle将用于存储爬取到的...
Python-前端笔试面试简答题汇总
2019-08-10 03:27

5. **Python在前端工具中的应用**：例如使用Python的`webbrowser`模块打开网页，`selenium`进行自动化测试，`BeautifulSoup`和`requests`组合进行网页抓取等。 6. **Python与前端交互**：如通过API接口使用Python...
没有解决我的问题, 去提问

悬赏问题

¥15 metadata提取的PDF元数据，如何转换为一个Excel
¥15 关于arduino编程toCharArray()函数的使用
¥100 vc++混合CEF采用CLR方式编译报错
¥15 coze 的插件输入飞书多维表格 app_token 后一直显示错误，如何解决？
¥15 vite+vue3+plyr播放本地public文件夹下视频无法加载
¥15 c#逐行读取txt文本，但是每一行里面数据之间空格数量不同
¥50 如何openEuler 22.03上安装配置drbd
¥20 ING91680C BLE5.3 芯片怎么实现串口收发数据
¥15 无线连接树莓派，无法执行update，如何解决？（相关搜索：软件下载）
¥15 Windows11, backspace, enter, space键失灵

码龄粉丝数原力等级 --

在Python中抓取网址

3条回答默认最新

码龄粉丝数原力等级 --

悬赏问题

在Python中抓取网址

3条回答 默认 最新

悬赏问题

3条回答默认最新