weixin_46887967 2020-10-27 20:01 采纳率: 67.9%
浏览 67
已采纳

这个代码为什么输出出来只有一条数据的?

-*- coding: utf-8 -*-

"""
Created on Tue Oct 27 19:25:44 2020

@author: lenovo
"""

import requests
from bs4 import BeautifulSoup
import time

headers = {'user-agent':'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.25 Safari/537.36 Core/1.70.3823.400 QQBrowser/10.6.4302.400'
}

def get_info(url):
wb_data = requests.get(url,headers = headers)
soup = BeautifulSoup(wb_data.text,'lxml')

ranks = soup.select('#rankWrap > div.pc_temp_songlist > ul > li:nth-child(1) > span.pc_temp_num > strong')
tittles = soup.select('#rankWrap > div.pc_temp_songlist > ul > li:nth-child(1) > a')
times = soup.select('#rankWrap > div.pc_temp_songlist > ul > li:nth-child(1) > span.pc_temp_tips_r > span')

for rank,tittle,time in zip(ranks,tittles,times):
    data = {
        'rank':rank.get_text().strip(),
        'singer':tittle.get_text().split('-')[0],
        'song':tittle.get_text().split('-')[1],

        'time':time.get_text().strip()
        }
    print(data)

if name == '__main__':
urls = ['https://www.kugou.com/yy/rank/home/{}-8888.html'.format(str(i)) for i in range (1,24)]
for url in urls:
get_info(url)

time.sleep(1)


![图片说明](https://img-ask.csdn.net/upload/202010/27/1603800054_296603.png)
  • 写回答

2条回答 默认 最新

  • 7*24 工作者 2020-10-28 09:32
    关注

    由于本人bs4库用的不是很熟,我爬虫习惯用 etree,所以我把你的代码用lxml库完善了下

    #!/usr/bin/env python
    #-*- coding:utf-8 -*-
    import requests
    from lxml import etree
    
    headers = {
        'user-agent':'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.25 Safari/537.36 Core/1.70.3823.400 QQBrowser/10.6.4302.400'
    }
    
    def get_info(url):
        wb_data = requests.get(url,headers = headers)
        html = etree.HTML(wb_data.content.decode('utf-8'))
        songs_list = html.xpath('//div[@id="rankWrap"]//div[@class="pc_temp_songlist "]/ul/li')
    
        for item in songs_list:
            title = item.xpath('./@title')[0]
            try:
                rank = item.xpath('.//span[@class="pc_temp_num"]/strong/text()')[0].strip()
            except Exception:
                rank = item.xpath('.//span[@class="pc_temp_num"]/text()')[0].strip()
    
            singer, song = title.split(' - ')
            time_song = item.xpath('.//span[@class="pc_temp_tips_r"]//span[@class="pc_temp_time"]/text()')[0]
            data = {
                "rank": rank,
                "singer": singer.strip(),
                "song": song.strip(),
                "time": time_song.strip()
            }
            print(data)
    
    if __name__ == '__main__':
        urls = ['https://www.kugou.com/yy/rank/home/{}-8888.html'.format(str(i)) for i in range (1,24)]
        for url in urls:
            get_info(url)
    
    
    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论
查看更多回答(1条)

报告相同问题?

悬赏问题

  • ¥20 机器学习能否像多层线性模型一样处理嵌套数据
  • ¥20 西门子S7-Graph,S7-300,梯形图
  • ¥50 用易语言http 访问不了网页
  • ¥50 safari浏览器fetch提交数据后数据丢失问题
  • ¥15 matlab不知道怎么改,求解答!!
  • ¥15 永磁直线电机的电流环pi调不出来
  • ¥15 用stata实现聚类的代码
  • ¥15 请问paddlehub能支持移动端开发吗?在Android studio上该如何部署?
  • ¥20 docker里部署springboot项目,访问不到扬声器
  • ¥15 netty整合springboot之后自动重连失效