m0_62918237 2023-03-31 01:35 采纳率: 100%
浏览 15
已结题

用于抓取文章的代码,爬取时报错,请解决



import requests
from bs4 import BeautifulSoup

def search_pubmed(query):
    base_url = "https://eutils.ncbi.nlm.nih.gov/entrez/eutils/"
    search_url = base_url + "esearch.fcgi?db=pubmed&term=" + query
    response = requests.get(search_url)
    soup = BeautifulSoup(response.text, '')
    id_list = [id.text for id in soup.find_all('Id')]
    return id_list

def fetch_details(pubmed_id):
    base_url = "https://eutils.ncbi.nlm.nih.gov/entrez/eutils/"
    fetch_url = base_url + "efetch.fcgi?db=pubmed&id=" + pubmed_id + "&retmode=xml"
    response = requests.get(fetch_url)
    soup = BeautifulSoup(r.text, 'html.parser')
    soup = BeautifulSoup(response.text, 'xml')
    try:
        title = soup.find('ArticleTitle').text
    except AttributeError:
        title = None
    try:
        abstract = soup.find('AbstractText').text
    except AttributeError:
        abstract = None
    try:
        journal = soup.find('JournalTitle').text
    except AttributeError:
        journal = None
    try:
        doi = soup.find('ArticleId', {'IdType': 'doi'}).text
    except AttributeError:
        doi = None
    return {'title': title, 'abstract': abstract, 'journal': journal, 'doi': doi}

# Example usage
ids = search_pubmed('human')
for id in ids:
    details = fetch_details(id)
    print(details)
报错如下;
Traceback (most recent call last):
  File "F:/桌面/抓2.py", line 38, in <module>
    ids = search_pubmed('human')
  File "F:/桌面/抓2.py", line 9, in search_pubmed
    soup = BeautifulSoup(response.text, '')
  File "C:\Users\HUAWEI\AppData\Local\Programs\Python\Python311\Lib\site-packages\bs4\__init__.py", line 249, in __init__
    raise FeatureNotFound(
bs4.FeatureNotFound: Couldn't find a tree builder with the features you requested: . Do you need to install a parser library?

已经安装1xlm仍然报错,求解
  • 写回答

1条回答 默认 最新

  • Roc-xb 后端领域优质创作者 2023-03-31 07:03
    关注

    代码 存在问题,已经帮你修改好了,下面的代码能够正常运行。
    如果对你有帮助,点个采纳谢谢!

    
    # !/usr/bin/env python
    # -*- coding: utf-8 -*-
    
    # @author: yjp
    # @software: PyCharm
    # @file: main.py
    # @time: 2022-08-08 16:49
    import requests
    from bs4 import BeautifulSoup
    
    
    def search_pubmed(query):
        base_url = "https://eutils.ncbi.nlm.nih.gov/entrez/eutils/"
        search_url = base_url + "esearch.fcgi?db=pubmed&term=" + query
        print(search_url)
        response = requests.get(search_url)
        soup = BeautifulSoup(response.text, 'xml')
        id_list = [id.text for id in soup.find_all('Id')]
        print(id_list)
        return id_list
    
    
    def fetch_details(pubmed_id):
        base_url = "https://eutils.ncbi.nlm.nih.gov/entrez/eutils/"
        fetch_url = base_url + "efetch.fcgi?db=pubmed&id=" + pubmed_id + "&retmode=xml"
        print(fetch_url)
        response = requests.get(fetch_url)
        soup = BeautifulSoup(response.text, 'xml')
        try:
            title = soup.find('ArticleTitle').text
        except AttributeError:
            title = None
        try:
            abstract = soup.find('AbstractText').text
        except AttributeError:
            abstract = None
        try:
            journal = soup.find('JournalTitle').text
        except AttributeError:
            journal = None
        try:
            doi = soup.find('ArticleId', {'IdType': 'doi'}).text
        except AttributeError:
            doi = None
        return {'title': title, 'abstract': abstract, 'journal': journal, 'doi': doi}
    
    
    if __name__ == '__main__':
        # Example usage
        ids = search_pubmed('human')
        for id in ids:
            details = fetch_details(id)
            print(details)
    
    

    img

    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论

报告相同问题?

问题事件

  • 系统已结题 4月8日
  • 已采纳回答 3月31日
  • 修改了问题 3月31日
  • 创建了问题 3月31日

悬赏问题

  • ¥15 网络科学导论,网络控制
  • ¥15 metadata提取的PDF元数据,如何转换为一个Excel
  • ¥15 关于arduino编程toCharArray()函数的使用
  • ¥100 vc++混合CEF采用CLR方式编译报错
  • ¥15 coze 的插件输入飞书多维表格 app_token 后一直显示错误,如何解决?
  • ¥15 vite+vue3+plyr播放本地public文件夹下视频无法加载
  • ¥15 c#逐行读取txt文本,但是每一行里面数据之间空格数量不同
  • ¥50 如何openEuler 22.03上安装配置drbd
  • ¥20 ING91680C BLE5.3 芯片怎么实现串口收发数据
  • ¥15 无线连接树莓派,无法执行update,如何解决?(相关搜索:软件下载)