江jh 2020-02-27 20:30 采纳率: 70%
浏览 549
已采纳

就只问一下用python爬虫(其他python方法也行)获取这两个网站的HTML(超文本标记语言就是<html>...<head>...<html>)怎么做?我试过普通方法行不通

  • 写回答

1条回答 默认 最新

  • threenewbee 2020-02-27 21:02
    关注

    这两个网站没有什么特别,唯一就是境外,建议你先建立科学的连接,然后访问

    import requests
    res = requests.get('https://pypi.org/search/?c=Programming+Language+%3A%3A+Python+%3A%3A+3')
    res.encoding = 'utf-8'
    print(res.text)
    

    源代码太多了,我这里各贴出前100行

    
    
    
    
    
    
    <!DOCTYPE html>
    <html lang="en">
      <head>
        <meta charset="utf-8">
        <meta http-equiv="X-UA-Compatible" content="IE=edge">
        <meta name="viewport" content="width=device-width, initial-scale=1">
    
        <meta name="defaultLanguage" content="en">
        <meta name="availableLanguages" content="en, es, fr, ja, pt_BR, uk, el, de">
    
    
    
        <title>Search results · PyPI</title>
        <meta name="description" content="The Python Package Index (PyPI) is a repository of software for the Python programming language.">
    
        <link rel="stylesheet" href="/static/css/warehouse.26fd4b09.css">
        <link rel="stylesheet" href="/static/css/fontawesome.91df071f.css">
        <link rel="stylesheet" href="/static/css/regular.8819f1a9.css">
        <link rel="stylesheet" href="/static/css/solid.002489ee.css">
        <link rel="stylesheet" href="/static/css/brands.0c9eb08b.css">
        <link rel="stylesheet" href="https://fonts.googleapis.com/css?family=Source+Sans+Pro:400,400italic,600,600italic,700,700italic%7CSource+Code+Pro:500">
        <noscript>
          <link rel="stylesheet" href="/static/css/noscript.69d08c82.css">
        </noscript>
    
    
    
        <link rel="icon" href="/static/images/favicon.6a76275d.ico" type="image/x-icon">
    
        <link rel="alternate" type="application/rss+xml" title="RSS: 40 latest updates" href="/rss/updates.xml">
        <link rel="alternate" type="application/rss+xml" title="RSS: 40 newest packages" href="/rss/packages.xml">
    
    
        <meta property="og:url" content="https://pypi.org/search/?c=Programming+Language+%3A%3A+Python+%3A%3A+3">
        <meta property="og:site_name" content="PyPI">
        <meta property="og:type" content="website">
        <meta property="og:image" content="https://pypi.org/static/images/twitter.c0030826.jpg">
        <meta property="og:title" content="Search results">
        <meta property="og:description" content="The Python Package Index (PyPI) is a repository of software for the Python programming language.">
    
        <link rel="search" type="application/opensearchdescription+xml" title="PyPI" href="/opensearch.xml">
    
    
        <script
          src="https://cdn.ravenjs.com/3.26.2/raven.min.js"
          integrity="sha384-D6LXy67EIC102DTuqypxwQsTHgiatlbvg7q/1YAWFb6lRyZ1lIZ6bGDsX7jxHNKA"
          crossorigin="anonymous">
        </script>
    
        <script async
                data-ga-id="UA-55961911-1"
                data-sentry-frontend-dsn="https://3a67b35c9dc248a191d761410b095861@sentry.io/1231155"
                src="/static/js/warehouse.092c6255.js">
        </script>
    
    
        <script async src="https://www.googletagmanager.com/gtag/js?id=UA-55961911-1"></script>
        <script defer src="https://www.fastly-insights.com/insights.js?k=6a52360a-f306-421e-8ed5-7417d0d4a4e9&dnt=true"></script>
      </head>
    
      <body data-controller="viewport-toggle">
    
    
        <!-- Accessibility: this link should always be the first piece of content inside the body-->
        <a href="#content" class="skip-to-content">Skip to main content</a>
    
        <button type="button" class="button button--primary button--switch-to-mobile hidden" data-target="viewport-toggle.switchToMobile" data-action="viewport-toggle#switchToMobile">
          Switch to mobile version
        </button>
    
        <div id="sticky-notifications" class="stick-to-top js-stick-to-top">
          <!-- Add browser warning. Will show for ie9 and below -->
          <!--[if IE]>
          <div class="notification-bar notification-bar--warning" role="status">
            <span class="notification-bar__icon">
              <i class="fa fa-exclamation-triangle" aria-hidden="true"></i>
              <span class="sr-only">Warning</span>
            </span>
            <span class="notification-bar__message">You are using an unsupported browser, upgrade to a newer version.</span>
          </div>
          <![endif]-->
    
          <noscript>
          <div class="notification-bar notification-bar--warning" role="status">
    
            <span class="notification-bar__icon">
              <i class="fa fa-exclamation-triangle" aria-hidden="true"></i>
              <span class="sr-only">Warning</span>
            </span>
            <span class="notification-bar__message">Some features may not work without JavaScript. Please try enabling it if you encounter problems.</span>
          </div>
          </noscript>
        </div>
    
    
          <div data-html-include="/_includes/flash-messages/">
          </div>
    
    
    
    
    
    
    
    <!DOCTYPE html>
    <html lang="en">
      <head>
        <meta charset="utf-8">
        <meta http-equiv="X-UA-Compatible" content="IE=edge">
        <meta name="viewport" content="width=device-width, initial-scale=1">
    
        <meta name="defaultLanguage" content="en">
        <meta name="availableLanguages" content="en, es, fr, ja, pt_BR, uk, el, de">
    
    
    
        <title>qface-qtcpp · PyPI</title>
        <meta name="description" content="Qt CPP generator based on the QFace library">
    
        <link rel="stylesheet" href="/static/css/warehouse.26fd4b09.css">
        <link rel="stylesheet" href="/static/css/fontawesome.91df071f.css">
        <link rel="stylesheet" href="/static/css/regular.8819f1a9.css">
        <link rel="stylesheet" href="/static/css/solid.002489ee.css">
        <link rel="stylesheet" href="/static/css/brands.0c9eb08b.css">
        <link rel="stylesheet" href="https://fonts.googleapis.com/css?family=Source+Sans+Pro:400,400italic,600,600italic,700,700italic%7CSource+Code+Pro:500">
        <noscript>
          <link rel="stylesheet" href="/static/css/noscript.69d08c82.css">
        </noscript>
    
    
    
        <link rel="icon" href="/static/images/favicon.6a76275d.ico" type="image/x-icon">
    
        <link rel="alternate" type="application/rss+xml" title="RSS: 40 latest updates" href="/rss/updates.xml">
        <link rel="alternate" type="application/rss+xml" title="RSS: 40 newest packages" href="/rss/packages.xml">
    
        <link rel="canonical" href="https://pypi.org/project/qface-qtcpp/">
    
    
        <meta property="og:url" content="https://pypi.org/project/qface-qtcpp/">
        <meta property="og:site_name" content="PyPI">
        <meta property="og:type" content="website">
        <meta property="og:image" content="https://pypi.org/static/images/twitter.c0030826.jpg">
        <meta property="og:title" content="qface-qtcpp">
        <meta property="og:description" content="Qt CPP generator based on the QFace library">
    
        <link rel="search" type="application/opensearchdescription+xml" title="PyPI" href="/opensearch.xml">
    
    
        <script
          src="https://cdn.ravenjs.com/3.26.2/raven.min.js"
          integrity="sha384-D6LXy67EIC102DTuqypxwQsTHgiatlbvg7q/1YAWFb6lRyZ1lIZ6bGDsX7jxHNKA"
          crossorigin="anonymous">
        </script>
    
        <script async
                data-ga-id="UA-55961911-1"
                data-sentry-frontend-dsn="https://3a67b35c9dc248a191d761410b095861@sentry.io/1231155"
                src="/static/js/warehouse.092c6255.js">
        </script>
    
    
        <script async src="https://www.googletagmanager.com/gtag/js?id=UA-55961911-1"></script>
        <script defer src="https://www.fastly-insights.com/insights.js?k=6a52360a-f306-421e-8ed5-7417d0d4a4e9&dnt=true"></script>
      </head>
    
      <body data-controller="viewport-toggle">
    
    
        <!-- Accessibility: this link should always be the first piece of content inside the body-->
        <a href="#content" class="skip-to-content">Skip to main content</a>
    
        <button type="button" class="button button--primary button--switch-to-mobile hidden" data-target="viewport-toggle.switchToMobile" data-action="viewport-toggle#switchToMobile">
          Switch to mobile version
        </button>
    
        <div id="sticky-notifications" class="stick-to-top js-stick-to-top">
          <!-- Add browser warning. Will show for ie9 and below -->
          <!--[if IE]>
          <div class="notification-bar notification-bar--warning" role="status">
            <span class="notification-bar__icon">
              <i class="fa fa-exclamation-triangle" aria-hidden="true"></i>
              <span class="sr-only">Warning</span>
            </span>
            <span class="notification-bar__message">You are using an unsupported browser, upgrade to a newer version.</span>
          </div>
          <![endif]-->
    
          <noscript>
          <div class="notification-bar notification-bar--warning" role="status">
    
            <span class="notification-bar__icon">
              <i class="fa fa-exclamation-triangle" aria-hidden="true"></i>
              <span class="sr-only">Warning</span>
            </span>
            <span class="notification-bar__message">Some features may not work without JavaScript. Please try enabling it if you encounter problems.</span>
          </div>
          </noscript>
        </div>
    
    
    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论

报告相同问题?

悬赏问题

  • ¥15 【急】在线问答CNC雕刻机的电子电路与编程
  • ¥60 在mc68335芯片上移植ucos ii 的成功工程文件
  • ¥15 笔记本外接显示器正常,但是笔记本屏幕黑屏
  • ¥15 Python pandas
  • ¥15 蓝牙硬件,可以用哪几种方法控制手机点击和滑动
  • ¥15 生物医学数据分析。基础课程就v经常唱课程舅成牛逼
  • ¥15 云环境云开发云函数对接微信商户中的分账功能
  • ¥15 空间转录组CRAD遇到问题
  • ¥20 materialstudio计算氢键脚本问题
  • ¥15 有没有代做有偿主要做数据可视化部分即可(2023全国高考更省一本线理科类)