用python提取div里的文本

import requests
from bs4 import BeautifulSoup
import pprint
import json
url="http://www.miaomu.com/qyml/default.asp"
r= requests.get(url)
html=r.content.decode('gbk','ignore')
soup=BeautifulSoup(html,"html.parser")
articles=soup.find_all("div",{"class":"gyjtnr"})
articles 图片说明

写回答
好问题 0 提建议
追加酬金
关注问题
分享
邀请回答
编辑收藏删除结题
收藏举报

3条回答默认最新

7*24 工作者 2020-01-14 13:30

关注

可以参考下

# -*- coding:utf-8 -*-

import requests
from lxml import etree
from pprint import pprint

def get_html(url):
    headers = {"User-Agent":"Mozilla/5.0 (compatible; MSIE 9.0; AOL 9.0; Windows NT 6.0; Trident/5.0)",}
    r = requests.get(url,headers=headers)
    return r.content.decode('gb2312','ignore')

def parse_html(text):
    infos = {}
    html = etree.HTML(text)
    datas = html.xpath("//div[@class='gynr']/div[@class='gyjtnr']")
    for index,data in enumerate(datas,1):
        name = html.xpath("//div[@class='gynr']/p[%s]//b/text()" % index)[0]
        content = ''.join(data.xpath("./text()"))
        if name not in infos.keys():
            infos[name] = content
    return infos


if __name__ == '__main__':
    url="http://www.miaomu.com/qyml/default.asp"
    html = get_html(url=url)
    if html:
        infos = parse_html(text=html)
        pprint(infos)

本回答被题主选为最佳回答 , 对您是否有帮助呢?

查看更多回答(2条)

报告相同问题？

关注问题

用python提取div里的文本 python
2020-01-14 10:58

回答 3 已采纳可以参考下 ``` # -*- coding:utf-8 -*- import requests from lxml import etree from pprint import
怎么用Beautifulsoup4提取div块里的文本？如图 python 有问必答爬虫
2022-03-29 09:37

回答 3 已采纳获取tex属性，示例如下 from bs4 import BeautifulSoup soup=''' <div class="cell"> <svg class="icon"&g
如何同时提取多个同种div下的第某个span标签(语言-python) css3 html python 有问必答
2021-12-26 11:47

回答 2 已采纳先获取所有class='hd'的div保存到列表中，然后遍历列表中每一项获取这一项div的第二个span targets2 = soup.find_all("div", class_="hd") fo
python提取html的div属性,使用BeautifulSoup提取html div类
2021-06-12 15:56

s.xie的博客我想从下面的HTML中获取“8.0”： ==$0"8.0" /10::after我尝试了下面的代码来提取div class='js otelpuani'中的'8.0'，但它似乎不起作用import urllibimport requestsfrom bs4 import BeautifulSoupimport ...
python爬虫如何取出定位标签下的所有子集文本 python
2021-03-27 16:46

回答 1 已采纳将原代码中这段内容： for j in page_spec_data: for k in j.a: # print(k.string) value_word
python爬虫关于xpath提取出来为空列表的问题 python 有问必答爬虫
2021-09-30 17:40

回答 2 已采纳你检查下这个网页中的内容是不是通过js代码读取外部json数据来动态更新的。requests只能获取网页的静态源代码，动态更新的内容取不到。对于动态更新的内容要用selenium 来爬取。或者是通
用python获取里面的a标签的链接地址 javascript python
2017-11-15 03:55

回答 2 已采纳这是爬取本页a标签url的方法，参考一下，要下载lxml。不下载的话，Beatifusoup()方法里不写,'lxml'也行。 import requests from bs4 import Be
python中获取div标签中的文本
2022-10-27 00:18

Techiexec的博客 python自动化时获取div标签中的文本
xpath提取不到 text 文本 python 有问必答
2021-07-19 17:03

回答 4 已采纳选取其所在标签，然后用text属性获取其下所有文本值。 txt='''<div class='item'> <span class="p1">制片国家/地区:</span
如何使用python爬虫从企查查上获得专利文献内容？ python windows 有问必答爬虫
2021-12-18 11:16

回答 2 已采纳题主要的代码如下， from bs4 import BeautifulSoup import requests header = {"user-agent":"Mozilla/5.0.html (
python自动操作百度知道问题 python 有问必答
2021-07-20 21:06

回答 3 已采纳直接去掉这些代码，你每次都删除文件再创建，肯定只有最后一行的数据如果对你有帮助，可以点击我这个回答右上方的【采纳】按钮，给我个采纳吗，谢谢
python提取p标签的文本,Python3.5 BeautifulSoup4从div中的'p'获取文本
2020-12-16 12:23

邢仁的博客 I am trying to pull all the text from the div class 'caselawcontent searchable-content'. This code just prints the ...
用xpath爬取文本时如何去掉非文本内容 python 爬虫
2021-12-18 14:35

回答 1 已采纳 discribe =html.xpath('normalize-space(//div[@class="container-fluid"]//div[@class="work_b"]//text()
python提取p标签的文本_从p标签获取文本内容
2021-02-10 14:03

weixin_39942474的博客我正在尝试获取此页面上每个块的描述文本内容用于p标签的html看起来像http://DataMiningBlog.com covers current challenges, interviews with leading actors and book reviews related to data mining, analytics ...
Python 高效提取 HTML 文本的方法
2021-01-12 09:39

小白^-的博客通常，默认解决方案是使用BeautifulSoup软件包中的get_text方法，该方法内部使用lxml。这是一个经过充分测试的解决方案，但是在处理成千上万个HTML文档时可能会非常慢。通过用selectolax替换BeautifulSoup，您几乎...
没有解决我的问题, 去提问

悬赏问题

¥15 如何在scanpy上做差异基因和通路富集？
¥20 关于#硬件工程#的问题，请各位专家解答！
¥15 关于#matlab#的问题：期望的系统闭环传递函数为G(s)=wn^2/s^2+2¢wn+wn^2阻尼系数¢=0.707，使系统具有较小的超调量
¥15 FLUENT如何实现在堆积颗粒的上表面加载高斯热源
¥30 截图中的mathematics程序转换成matlab
¥15 动力学代码报错，维度不匹配
¥15 Power query添加列问题
¥50 Kubernetes&Fission&Eleasticsearch
¥15 報錯：Person is not mapped，如何解決？
¥15 c++头文件不能识别CDialog

码龄粉丝数原力等级 --

用python提取div里的文本

3条回答默认最新

码龄粉丝数原力等级 --

悬赏问题

用python提取div里的文本

3条回答 默认 最新

悬赏问题

3条回答默认最新