python爬虫运行成功但是数据没有输出

我做一个爬取pubmed的爬虫来爬取文章标题和链接。
可是能正常运行但是怕去不了数据，我测试后他说未发现文章。可是我用之前的一个用过的爬虫爬取同样的网站，他能爬取出文章，这是为什么啊各位。
这个是运行但不出数据

import requests
from bs4 import BeautifulSoup

url = 'https://pubmed.ncbi.nlm.nih.gov/?term=cervical%20cancer%20treatment&filter=years.2020-2023'
headers = {
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/110.0.0.0 Safari/537.36 Edg/110.0.1587.50'
}

def get_articles(url):
    # 发送HTTP请求，获取页面数据
    response = requests.get(url, headers=headers)
    html = response.text

    # 解析HTML代码
    soup = BeautifulSoup(html, 'html.parser')

    # 找到包含文章信息的标签
    article_tags = soup.select('.docsum-content')
    
    # 提取每篇文章的标题和链接
    results = []
    for tag in article_tags:
        title_tags = tag.select('.docsum-title > a')
        if title_tags:
            title = title_tags[0].get_text().strip()
            link = 'https://pubmed.ncbi.nlm.nih.gov' + title_tags[0]['href']
            results.append((title, link))
    
    return results

if __name__ == '__main__':
    for page in range(1, 6):
        page_url = f'{url}&page={page}'
        articles = get_articles(page_url)
        for article in articles:
            print(article[0])
            print(article[1])
            print('---')
if __name__ == '__main__':
    for page in range(1, 6):
        page_url = f'{url}&page={page}'
        articles = get_articles(page_url)
        print(f'Page {page}: {page_url} ({len(articles)} articles found)')
        for article in articles:
            print(article[0])
            print(article[1])
            print('---')
这是可以运行的代码
```python
import requests
from bs4 import BeautifulSoup
import pandas as pd

url = "https://pubmed.ncbi.nlm.nih.gov/?term=cervical%20cancer%20treatment&filter=years.2020-2023"
num_pages = 10

data = []

for i in range(num_pages):
    # Construct the URL for the current page
    page_url = f"{url}&page={i+1}"
    
    # Make a request to the page and parse the HTML using Beautiful Soup
    response = requests.get(page_url)
    soup = BeautifulSoup(response.content, 'html.parser')
    
    # Find all the articles on the current page
    articles = soup.find_all("div", class_="docsum-content")
    
    # Extract the title and link for each article and append to the data list
    for article in articles:
        title = article.find("a", class_="docsum-title").text.strip()
        link = article.find("a", class_="docsum-title")["href"]
        data.append([title, link])

df = pd.DataFrame(data, columns=["Title", "Link"])
df.to_excel("cervical_cancer_treatment.xlsx", index=False)

```

写回答
好问题 0 提建议
追加酬金
关注问题
分享
邀请回答
编辑收藏删除
收藏举报

8条回答默认最新

程序猿_Mr. Guo 2023-02-24 12:23

关注

选择a标签的时候错误了，应该是 title_tags = tag.select('a')，这样选择每一个a标签，因为 article_tags = soup.select('.docsum-content') 已经定位到具体的div了，遍历的时候只需要遍历下边的a标签就可以了

import requests
from bs4 import BeautifulSoup

url = 'https://pubmed.ncbi.nlm.nih.gov/?term=cervical%20cancer%20treatment&filter=years.2020-2023'
headers = {
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/110.0.0.0 Safari/537.36 Edg/110.0.1587.50'
}


def get_articles(url):
    # 发送HTTP请求，获取页面数据
    response = requests.get(url, headers=headers)
    html = response.text

    # 解析HTML代码
    soup = BeautifulSoup(html, 'html.parser')

    # 找到包含文章信息的标签
    article_tags = soup.select('.docsum-content')
    # print('article_tags : ', article_tags)

    # 提取每篇文章的标题和链接
    results = []
    for tag in article_tags:
        title_tags = tag.select('a')
        if title_tags:
            title = title_tags[0].get_text().strip()
            link = 'https://pubmed.ncbi.nlm.nih.gov' + title_tags[0]['href']
            results.append((title, link))

    return results


# if __name__ == '__main__':
#     for page in range(1, 6):
#         page_url = f'{url}&page={page}'
#         articles = get_articles(page_url)
#         for article in articles:
#             print(article[0])
#             print(article[1])
#             print('---')
if __name__ == '__main__':
    for page in range(1, 6):
        page_url = f'{url}&page={page}'
        articles = get_articles(page_url)
        print(f'Page {page}: {page_url} ({len(articles)} articles found)')
        for article in articles:
            print(article[0])
            print(article[1])
            print('---')

本回答被题主选为最佳回答 , 对您是否有帮助呢?

编辑记录

查看更多回答(7条)

报告相同问题？

关注问题

python爬虫抓数据，反馈请求成功，但是数据不对，这是为什么 python 爬虫
2022-07-20 16:02

回答 2 已采纳你确定你传的这两个参数能返回出有值的data？
python爬虫初学，运行不报错但是没有结果？ python 有问必答
2021-04-01 16:51

回答 4 已采纳代码问题：1.request少写了s。 2. fillUnivList(ulist,html)函数没有return。 3.printUnivList(ulist,num)缺少异常处理。 4
python爬虫爬取到的内容无法输出到txt文档中 python
2022-08-12 12:20

回答 3 已采纳不如换用requests库和bs4库吧。 from bs4 import BeautifulSoup as bs import requests as r url = 'https://fanqie
全网最全python爬虫精进
2021-04-25 17:33

yk 坤帝的博客因为这些数据是用计算机的语言写的，浏览器还要把这些数据翻译成我们能看得懂的内容；（2）提取数据：我们就可以在拿到的数据中，挑选出对我们有用的数据；（3）存储数据：将挑选出来的有用数据保存在某一文件/...
python 爬虫，如何爬取相关数据 python 有问必答爬虫
2021-11-11 11:15

回答 1 已采纳先确定需要爬取的网站，然后分析网站的数据来源，是后端生成数据还是ajax生成数据，确定数据来源方式就根据HTTP请求编写代码，这个涉及一些请求参数的加密、转换等等处理，然后清洗数据和数据入库
写完python爬虫后运行不出结果 python 有问必答爬虫
2022-02-15 05:15

回答 3 已采纳数据是动态从接口获取的，在网页中没有办法直接获取，除非用 selenium模块结合浏览器爬取动态数据 # -*- coding:utf-8 -*- import pandas as pd impor
python爬虫翻页爬取的数据是第一页的重复数据 python 爬虫问答团队
2021-12-18 19:23

回答 1 已采纳爬下一页就好了
python 爬虫代码
2023-03-09 21:49

qq_繁华的博客 python 爬虫代码
python爬虫代码运行不报错，但是保存到CSV的数据为空，是哪里出了问题 python
2022-08-09 16:38

回答 8 已采纳 respone.text获取的json数据格式有问题,用 json.loads(respone.text)解析出错另外写csv文件的代码要放到for循环外面,所有数据获取之后再一起写入 cit
python爬虫运行无结果 python 爬虫
2021-10-13 19:02

回答 1 已采纳你这个目前只写了一个类，并没有调用，肯定是没有结果的
为什么爬虫运行后什么也没有 python
2021-12-18 20:20

回答 2 已采纳可能list是空的,没有进for
【python爬虫】爬虫编程技术的解密与实战
2024-01-26 10:29

SarPro的博客作者首先解密了Python爬虫编程的关键技术，涵盖了网页解析、数据提取、请求模拟等方面。通过详细而易懂的讲解，读者能够轻松理解爬虫的基本原理与操作步骤。随后，博文以实战为导向，展示了爬虫在实际项目中的应用...
vscode python 运行程序后没有输出结果 python vscode 有问必答
2022-03-16 20:36

回答 2 已采纳文件中的代码是什么？有没有输出语句？是否有定义方法，然而方法并没有调用。
Python高阶---数据分析和网络爬虫
2022-09-20 23:48

肥大毛的博客 Python高阶---数据分析和网络爬虫
python爬虫怎么运行_python爬虫随笔(2)—启动爬虫与xpath
2020-11-30 11:23

weixin_39848347的博客既然我们采用cmd命令创建了scrapy爬虫，那就得有始有终有逼格，我们仍然采用程序员的正统方式——cmd的方式运行它scrapy crawl jobbole当我们在cmd中输入这条命令后，我们的爬虫也就开始运行了。但是如果每次都需要...
没有解决我的问题, 去提问

问题事件

关注

码龄粉丝数原力等级 --

被采纳

被点赞

采纳率
系统已结题 3月4日
关注

码龄粉丝数原力等级 --

被采纳

被点赞

采纳率
已采纳回答 2月24日
关注

码龄粉丝数原力等级 --

被采纳

被点赞

采纳率
创建了问题 2月24日

悬赏问题

¥15 关于#java#的问题：找一份能快速看完mooc视频的代码
¥15 这种微信登录授权谁可以做啊
¥15 请问我该如何添加自己的数据去运行蚁群算法代码
¥20 用HslCommunication 连接欧姆龙 plc有时会连接失败。报异常为“未知错误”
¥15 网络设备配置与管理这个该怎么弄
¥20 机器学习能否像多层线性模型一样处理嵌套数据
¥20 西门子S7-Graph,S7-300，梯形图
¥50 用易语言http 访问不了网页
¥50 safari浏览器fetch提交数据后数据丢失问题
¥15 matlab不知道怎么改，求解答！！

python爬虫运行成功但是数据没有输出

8条回答 默认 最新

问题事件

悬赏问题

8条回答默认最新