朝,夕 2021-04-06 16:22 采纳率: 50%
浏览 75
已采纳

Python爬虫怎么爬取动态内容?

爬取蜂窝网安徽全部景点http://www.mafengwo.cn/jd/12719/gonglve.html时,爬取不到 li 标签。

使用BeautifulSoup爬取为空。

soup = BeautifulSoup(html, 'html.parser')  
print(soup.select('html body div#container div.row-allScenic div.wrapper div.bd ul.scenic-list '))

结果如下

[<ul class="scenic-list clearfix">
</ul>]

网页ul内部代码如下(应该是动态生成的,直接查看源代码ul里面就是没有)

    <li>
        <a href="/poi/9602.html" target="_blank" title="黄山风景区">
            <div class="img"><img src="http://b1-q.mafengwo.net/s13/M00/6E/FE/wKgEaVyFR3SAKchQAAJXQXSOpZc87.jpeg?imageMogr2%2Fthumbnail%2F%21192x130r%2Fgravity%2FCenter%2Fcrop%2F%21192x130%2Fquality%2F100" width="192" height="130"></div>
            <h3>黄山风景区</h3>
        </a>

    </li>
    <li>
        <a href="/poi/7730080.html" target="_blank" title="宏村">
            <div class="img"><img src="http://p1-q.mafengwo.net/s15/M00/E6/DF/CoUBGV5HaamAcXx3AAGlNmbI4_U76.jpeg?imageMogr2%2Fthumbnail%2F%21192x130r%2Fgravity%2FCenter%2Fcrop%2F%21192x130%2Fquality%2F100" width="192" height="130"></div>
            <h3>宏村</h3>
        </a>

    </li>
    <li>
        <a href="/poi/9684.html" target="_blank" title="西海大峡谷">
            <div class="img"><img src="http://b1-q.mafengwo.net/s14/M00/13/97/wKgE2l1ipPeAO6aYAATuez1Jq3U09.jpeg?imageMogr2%2Fthumbnail%2F%21192x130r%2Fgravity%2FCenter%2Fcrop%2F%21192x130%2Fquality%2F100" width="192" height="130"></div>
            <h3>西海大峡谷</h3>
        </a>

    </li>
    <li>
        <a href="/poi/6328735.html" target="_blank" title="西递">
            <div class="img"><img src="http://b1-q.mafengwo.net/s15/M00/3B/4B/CoUBGV2kNdqADjy0AAPBiWhgJBo736.jpg?imageMogr2%2Fthumbnail%2F%21192x130r%2Fgravity%2FCenter%2Fcrop%2F%21192x130%2Fquality%2F100" width="192" height="130"></div>
            <h3>西递</h3>
        </a>

    </li>
    <li>
        <a href="/poi/9720.html" target="_blank" title="屯溪老街">
            <div class="img"><img src="http://b1-q.mafengwo.net/s13/M00/B3/0D/wKgEaV2bMp6AMMdwAAQqIROm1GA735.jpg?imageMogr2%2Fthumbnail%2F%21192x130r%2Fgravity%2FCenter%2Fcrop%2F%21192x130%2Fquality%2F100" width="192" height="130"></div>
            <h3>屯溪老街</h3>
        </a>

    </li>
    <li>
        <a href="/poi/5426908.html" target="_blank" title="徽州古城">
            <div class="img"><img src="http://n1-q.mafengwo.net/s10/M00/4F/A7/wKgBZ1jrgESAHGHQAAHt-nVAMu051.jpeg?imageMogr2%2Fthumbnail%2F%21192x130r%2Fgravity%2FCenter%2Fcrop%2F%21192x130%2Fquality%2F100" width="192" height="130"></div>
            <h3>徽州古城</h3>
        </a>

    </li>
    <li>
        <a href="/poi/5426501.html" target="_blank" title="黄山翡翠谷景区">
            <div class="img"><img src="http://b1-q.mafengwo.net/s12/M00/60/C4/wKgED1xIMMeAL4quAAqdKm2SP-Q74.jpeg?imageMogr2%2Fthumbnail%2F%21192x130r%2Fgravity%2FCenter%2Fcrop%2F%21192x130%2Fquality%2F100" width="192" height="130"></div>
            <h3>黄山翡翠谷景区</h3>
        </a>

    </li>
    <li>
        <a href="/poi/9605.html" target="_blank" title="光明顶">
            <div class="img"><img src="http://p1-q.mafengwo.net/s12/M00/58/27/wKgED1vkGQOAI7zOAAYh6jFZne054.jpeg?imageMogr2%2Fthumbnail%2F%21192x130r%2Fgravity%2FCenter%2Fcrop%2F%21192x130%2Fquality%2F100" width="192" height="130"></div>
            <h3>光明顶</h3>
        </a>

    </li>
    <li>
        <a href="/poi/1548.html" target="_blank" title="月沼湖">
            <div class="img"><img src="http://p1-q.mafengwo.net/s17/M00/92/D4/CoUBXl-Np1iEZLaDAAAAADwBCO0947.jpg?imageMogr2%2Fthumbnail%2F%21192x130r%2Fgravity%2FCenter%2Fcrop%2F%21192x130%2Fquality%2F100" width="192" height="130"></div>
            <h3>月沼湖</h3>
        </a>

    </li>
    <li>
        <a href="/poi/9724.html" target="_blank" title="南湖">
            <div class="img"><img src="http://b1-q.mafengwo.net/s10/M00/2F/F2/wKgBZ1nty7uAPRz6AAT5d2JPkUw44.jpeg?imageMogr2%2Fthumbnail%2F%21192x130r%2Fgravity%2FCenter%2Fcrop%2F%21192x130%2Fquality%2F100" width="192" height="130"></div>
            <h3>南湖</h3>
        </a>

    </li>
    <li>
        <a href="/poi/6328738.html" target="_blank" title="木坑竹海">
            <div class="img"><img src="http://b1-q.mafengwo.net/s12/M00/89/29/wKgED1wPqW2AQO81AA5hRvN8lqU60.jpeg?imageMogr2%2Fthumbnail%2F%21192x130r%2Fgravity%2FCenter%2Fcrop%2F%21192x130%2Fquality%2F100" width="192" height="130"></div>
            <h3>木坑竹海</h3>
        </a>

    </li>
    <li>
        <a href="/poi/5429154.html" target="_blank" title="查济古镇">
            <div class="img"><img src="http://n1-q.mafengwo.net/s12/M00/73/A6/wKgED1uTK3iAJGhOAEVmsM4Yp5c20.jpeg?imageMogr2%2Fthumbnail%2F%21192x130r%2Fgravity%2FCenter%2Fcrop%2F%21192x130%2Fquality%2F100" width="192" height="130"></div>
            <h3>查济古镇</h3>
        </a>

    </li>
    <li>
        <a href="/poi/6625188.html" target="_blank" title="徽杭古道">
            <div class="img"><img src="http://n1-q.mafengwo.net/s12/M00/C1/45/wKgED1veKgeAWJimAB4yzt6mrKE05.jpeg?imageMogr2%2Fthumbnail%2F%21192x130r%2Fgravity%2FCenter%2Fcrop%2F%21192x130%2Fquality%2F100" width="192" height="130"></div>
            <h3>徽杭古道</h3>
        </a>

    </li>
    <li>
        <a href="/poi/5426678.html" target="_blank" title="三河古镇">
            <div class="img"><img src="http://p1-q.mafengwo.net/s12/M00/55/06/wKgED1xD5QKAAeOgAAyamhBQPlM35.jpeg?imageMogr2%2Fthumbnail%2F%21192x130r%2Fgravity%2FCenter%2Fcrop%2F%21192x130%2Fquality%2F100" width="192" height="130"></div>
            <h3>三河古镇</h3>
        </a>

    </li>
    <li>
        <a href="/poi/5426350.html" target="_blank" title="呈坎">
            <div class="img"><img src="http://n1-q.mafengwo.net/s10/M00/E0/CB/wKgBZ1t-zXeAEEM6AG0HFweCAxw84.jpeg?imageMogr2%2Fthumbnail%2F%21192x130r%2Fgravity%2FCenter%2Fcrop%2F%21192x130%2Fquality%2F100" width="192" height="130"></div>
            <h3>呈坎</h3>
        </a>

    </li>

使用webdriver获取到文本,不知道怎么获取标签属性值(目前需要解决的问题)

    text_class=browser.find_element_by_css_selector('.scenic-list.clearfix')
    text=text_class.text #获取文本
    print(text)

使用XPath定位获取不了信息

print(browser.find_element_by_xpath('//div[@class="row row-allScenic"]//div[@class="wrapper"]//div[@class="bd"]//ul[@class="scenic-list clearfix"]//li[1]'))

返回结果如下

<selenium.webdriver.firefox.webelement.FirefoxWebElement (session="c4104180-ab74-44de-a274-620ffff68289", element="382f1abb-1795-48db-987d-80e5985cdef5")>
  • 写回答

6条回答 默认 最新

  • 关注

    可以先用webdriver获取动态更新后的html代码,再交给BeautifulSoup处理。

    from selenium import webdriver
    from bs4 import BeautifulSoup
    from time import sleep
    
    browser = webdriver.Chrome()
    browser.get('http://www.mafengwo.cn/jd/12719/gonglve.html')
    sleep(3)
    html = browser.find_element_by_tag_name("html").get_attribute("outerHTML")
    soup = BeautifulSoup(html, 'html.parser')
    print(soup.select('html body div#container div.row-allScenic div.wrapper div.bd ul.scenic-list '))
    
    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论
查看更多回答(5条)

报告相同问题?

悬赏问题

  • ¥15 求daily translation(DT)偏差订正方法的代码
  • ¥15 js调用html页面需要隐藏某个按钮
  • ¥15 ads仿真结果在圆图上是怎么读数的
  • ¥20 Cotex M3的调试和程序执行方式是什么样的?
  • ¥20 java项目连接sqlserver时报ssl相关错误
  • ¥15 一道python难题3
  • ¥15 牛顿斯科特系数表表示
  • ¥15 arduino 步进电机
  • ¥20 程序进入HardFault_Handler
  • ¥15 关于#python#的问题:自动化测试