Ffxxxxxx 2022-06-16 22:19 采纳率: 66.7%
浏览 102
已结题

这个真的不会啊谁会啊

1.自己准备一个文本文档,进行词频统计后,将高频词的前5个,制作一个折线图、一个簇状柱形图、一个饼图、将三个图形放在一个绘图区域中。
2. 任选一个带有表格
的网页,进行数据爬取,并将爬取结果保存在一个excel文件中。
说明:需要提交源代码和运行结果截图,截图要清晰。

  • 写回答

2条回答 默认 最新

  • Hann Yang 全栈领域优质创作者 2022-06-17 07:29
    关注

    【第1题】

    import matplotlib.pyplot as plt
    import numpy as np
    
    plt.rcParams['font.sans-serif'] = ['SimHei']  #用来正常显示中文标签
    plt.rcParams['axes.unicode_minus'] = False    #用来正常显示负号
    
    with open('zen.txt','r') as f:
        data = f.read()
    
    data = data.replace(',','')
    data = data.replace('.','')
    data = data.replace('-','')
    data = data.replace('*','')
    data = data.replace('\n','')
    
    lst = [word.lower() for word in data.split()]
    dct = {word:lst.count(word) for word in lst}
    dct = sorted(dct.items(), key=lambda x:-x[1])
    
    X,Y = [d[0] for d in dct[:5]],[d[1] for d in dct[:5]]
    
    plt.figure('词频统计',figsize=(12,5))
    plt.subplot(1,3,1)
    plt.title("折线图")
    plt.ylim(0,12)
    plt.plot(X, Y, color="red", label="图例一")
    plt.legend()
    
    plt.subplot(1,3,2)
    plt.title("柱状图")
    plt.ylim(0,12)
    plt.bar(X, Y, label="图例二")
    plt.legend()
    
    plt.subplot(1,3,3)
    plt.title("饼图")
    exp = (0.02, 0.03, 0.04, 0.05, 0.1) #离圆心位置
    plt.pie(Y, labels=X, explode = exp, autopct="(%1.1f%%)")
    plt.legend(loc="lower left") #图例位置左下
    
    plt.show()
    

    img

    测试文本zen.txt内容:
    The Zen of Python, by Tim Peters

    Beautiful is better than ugly.
    Explicit is better than implicit.
    Simple is better than complex.
    Complex is better than complicated.
    Flat is better than nested.
    Sparse is better than dense.
    Readability counts.
    Special cases aren't special enough to break the rules.
    Although practicality beats purity.
    Errors should never pass silently.
    Unless explicitly silenced.
    In the face of ambiguity, refuse the temptation to guess.
    There should be one-- and preferably only one --obvious way to do it.
    Although that way may not be obvious at first unless you're Dutch.
    Now is better than never.
    Although never is often better than right now.
    If the implementation is hard to explain, it's a bad idea.
    If the implementation is easy to explain, it may be a good idea.
    Namespaces are one honking great idea -- let's do more of those!

    The Zen of Python is written to a file named "zen.txt".

    【第2题】爬取高校排名,写入xlsx文件。

    from bs4 import BeautifulSoup as bs
    from requests import get
    import pandas as pd
    import re
    
    Agent = 'Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/85.0.4183.83 Safari/537.36'
    url = 'http://www.gaosan.com/gaokao/311315.html'
    data = get(url,headers = {'User-Agent':Agent})
    data.encoding='utf-8'
    
    soup = bs(data.text,'html.parser')
    table = soup.find('table')
    colleges = table.find_all("td")
    
    rol,row = [],[]
    for i,n in enumerate(colleges):
            rol.append(re.findall(r'>(.+?)<', str(n))[0].strip())
            if i%4==3:
                row.append(rol)
                rol = []
    
    xlsx = pd.ExcelWriter('college.xlsx')
    text = pd.DataFrame(row)
    text.to_excel(xlsx, header=None, index=None)
    xlsx.save()
    xlsx.close()
    print('done!')
    
    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论 编辑记录
查看更多回答(1条)

报告相同问题?

问题事件

  • 系统已结题 6月25日
  • 已采纳回答 6月17日
  • 创建了问题 6月16日

悬赏问题

  • ¥15 电脑键盘实现触摸功能
  • ¥25 matlab无法将表达式转换为双数组怎么解决?
  • ¥15 单片机汇编语言相关程序
  • ¥20 家用射频美容仪技术规格
  • ¥15 大家帮我看看为什么错了
  • ¥15 unity互动琴弦抖动效果
  • ¥15 做了个的二极管反向饱和电流测量电路,但是测试达不到效果
  • ¥15 nginx无证书访问https失败
  • ¥15 树莓派启动AP热点传入数据
  • ¥15 multisim中关于74ls192n和DSWPK开关的问题(相关搜索:计数器)