就只问一下用python爬虫(其他python方法也行)获取这两个网站的HTML(超文本标记语言就是<html>...<head>...<html>)怎么做?我试过普通方法行不通
- 写回答
- 好问题 0 提建议
- 追加酬金
- 关注问题
- 邀请回答
-
1条回答 默认 最新
- threenewbee 2020-02-27 21:02关注
这两个网站没有什么特别,唯一就是境外,建议你先建立科学的连接,然后访问
import requests res = requests.get('https://pypi.org/search/?c=Programming+Language+%3A%3A+Python+%3A%3A+3') res.encoding = 'utf-8' print(res.text)
源代码太多了,我这里各贴出前100行
<!DOCTYPE html> <html lang="en"> <head> <meta charset="utf-8"> <meta http-equiv="X-UA-Compatible" content="IE=edge"> <meta name="viewport" content="width=device-width, initial-scale=1"> <meta name="defaultLanguage" content="en"> <meta name="availableLanguages" content="en, es, fr, ja, pt_BR, uk, el, de"> <title>Search results · PyPI</title> <meta name="description" content="The Python Package Index (PyPI) is a repository of software for the Python programming language."> <link rel="stylesheet" href="/static/css/warehouse.26fd4b09.css"> <link rel="stylesheet" href="/static/css/fontawesome.91df071f.css"> <link rel="stylesheet" href="/static/css/regular.8819f1a9.css"> <link rel="stylesheet" href="/static/css/solid.002489ee.css"> <link rel="stylesheet" href="/static/css/brands.0c9eb08b.css"> <link rel="stylesheet" href="https://fonts.googleapis.com/css?family=Source+Sans+Pro:400,400italic,600,600italic,700,700italic%7CSource+Code+Pro:500"> <noscript> <link rel="stylesheet" href="/static/css/noscript.69d08c82.css"> </noscript> <link rel="icon" href="/static/images/favicon.6a76275d.ico" type="image/x-icon"> <link rel="alternate" type="application/rss+xml" title="RSS: 40 latest updates" href="/rss/updates.xml"> <link rel="alternate" type="application/rss+xml" title="RSS: 40 newest packages" href="/rss/packages.xml"> <meta property="og:url" content="https://pypi.org/search/?c=Programming+Language+%3A%3A+Python+%3A%3A+3"> <meta property="og:site_name" content="PyPI"> <meta property="og:type" content="website"> <meta property="og:image" content="https://pypi.org/static/images/twitter.c0030826.jpg"> <meta property="og:title" content="Search results"> <meta property="og:description" content="The Python Package Index (PyPI) is a repository of software for the Python programming language."> <link rel="search" type="application/opensearchdescription+xml" title="PyPI" href="/opensearch.xml"> <script src="https://cdn.ravenjs.com/3.26.2/raven.min.js" integrity="sha384-D6LXy67EIC102DTuqypxwQsTHgiatlbvg7q/1YAWFb6lRyZ1lIZ6bGDsX7jxHNKA" crossorigin="anonymous"> </script> <script async data-ga-id="UA-55961911-1" data-sentry-frontend-dsn="https://3a67b35c9dc248a191d761410b095861@sentry.io/1231155" src="/static/js/warehouse.092c6255.js"> </script> <script async src="https://www.googletagmanager.com/gtag/js?id=UA-55961911-1"></script> <script defer src="https://www.fastly-insights.com/insights.js?k=6a52360a-f306-421e-8ed5-7417d0d4a4e9&dnt=true"></script> </head> <body data-controller="viewport-toggle"> <!-- Accessibility: this link should always be the first piece of content inside the body--> <a href="#content" class="skip-to-content">Skip to main content</a> <button type="button" class="button button--primary button--switch-to-mobile hidden" data-target="viewport-toggle.switchToMobile" data-action="viewport-toggle#switchToMobile"> Switch to mobile version </button> <div id="sticky-notifications" class="stick-to-top js-stick-to-top"> <!-- Add browser warning. Will show for ie9 and below --> <!--[if IE]> <div class="notification-bar notification-bar--warning" role="status"> <span class="notification-bar__icon"> <i class="fa fa-exclamation-triangle" aria-hidden="true"></i> <span class="sr-only">Warning</span> </span> <span class="notification-bar__message">You are using an unsupported browser, upgrade to a newer version.</span> </div> <![endif]--> <noscript> <div class="notification-bar notification-bar--warning" role="status"> <span class="notification-bar__icon"> <i class="fa fa-exclamation-triangle" aria-hidden="true"></i> <span class="sr-only">Warning</span> </span> <span class="notification-bar__message">Some features may not work without JavaScript. Please try enabling it if you encounter problems.</span> </div> </noscript> </div> <div data-html-include="/_includes/flash-messages/"> </div>
<!DOCTYPE html> <html lang="en"> <head> <meta charset="utf-8"> <meta http-equiv="X-UA-Compatible" content="IE=edge"> <meta name="viewport" content="width=device-width, initial-scale=1"> <meta name="defaultLanguage" content="en"> <meta name="availableLanguages" content="en, es, fr, ja, pt_BR, uk, el, de"> <title>qface-qtcpp · PyPI</title> <meta name="description" content="Qt CPP generator based on the QFace library"> <link rel="stylesheet" href="/static/css/warehouse.26fd4b09.css"> <link rel="stylesheet" href="/static/css/fontawesome.91df071f.css"> <link rel="stylesheet" href="/static/css/regular.8819f1a9.css"> <link rel="stylesheet" href="/static/css/solid.002489ee.css"> <link rel="stylesheet" href="/static/css/brands.0c9eb08b.css"> <link rel="stylesheet" href="https://fonts.googleapis.com/css?family=Source+Sans+Pro:400,400italic,600,600italic,700,700italic%7CSource+Code+Pro:500"> <noscript> <link rel="stylesheet" href="/static/css/noscript.69d08c82.css"> </noscript> <link rel="icon" href="/static/images/favicon.6a76275d.ico" type="image/x-icon"> <link rel="alternate" type="application/rss+xml" title="RSS: 40 latest updates" href="/rss/updates.xml"> <link rel="alternate" type="application/rss+xml" title="RSS: 40 newest packages" href="/rss/packages.xml"> <link rel="canonical" href="https://pypi.org/project/qface-qtcpp/"> <meta property="og:url" content="https://pypi.org/project/qface-qtcpp/"> <meta property="og:site_name" content="PyPI"> <meta property="og:type" content="website"> <meta property="og:image" content="https://pypi.org/static/images/twitter.c0030826.jpg"> <meta property="og:title" content="qface-qtcpp"> <meta property="og:description" content="Qt CPP generator based on the QFace library"> <link rel="search" type="application/opensearchdescription+xml" title="PyPI" href="/opensearch.xml"> <script src="https://cdn.ravenjs.com/3.26.2/raven.min.js" integrity="sha384-D6LXy67EIC102DTuqypxwQsTHgiatlbvg7q/1YAWFb6lRyZ1lIZ6bGDsX7jxHNKA" crossorigin="anonymous"> </script> <script async data-ga-id="UA-55961911-1" data-sentry-frontend-dsn="https://3a67b35c9dc248a191d761410b095861@sentry.io/1231155" src="/static/js/warehouse.092c6255.js"> </script> <script async src="https://www.googletagmanager.com/gtag/js?id=UA-55961911-1"></script> <script defer src="https://www.fastly-insights.com/insights.js?k=6a52360a-f306-421e-8ed5-7417d0d4a4e9&dnt=true"></script> </head> <body data-controller="viewport-toggle"> <!-- Accessibility: this link should always be the first piece of content inside the body--> <a href="#content" class="skip-to-content">Skip to main content</a> <button type="button" class="button button--primary button--switch-to-mobile hidden" data-target="viewport-toggle.switchToMobile" data-action="viewport-toggle#switchToMobile"> Switch to mobile version </button> <div id="sticky-notifications" class="stick-to-top js-stick-to-top"> <!-- Add browser warning. Will show for ie9 and below --> <!--[if IE]> <div class="notification-bar notification-bar--warning" role="status"> <span class="notification-bar__icon"> <i class="fa fa-exclamation-triangle" aria-hidden="true"></i> <span class="sr-only">Warning</span> </span> <span class="notification-bar__message">You are using an unsupported browser, upgrade to a newer version.</span> </div> <![endif]--> <noscript> <div class="notification-bar notification-bar--warning" role="status"> <span class="notification-bar__icon"> <i class="fa fa-exclamation-triangle" aria-hidden="true"></i> <span class="sr-only">Warning</span> </span> <span class="notification-bar__message">Some features may not work without JavaScript. Please try enabling it if you encounter problems.</span> </div> </noscript> </div>
本回答被题主选为最佳回答 , 对您是否有帮助呢?解决 无用评论 打赏 举报
悬赏问题
- ¥15 【急】在线问答CNC雕刻机的电子电路与编程
- ¥60 在mc68335芯片上移植ucos ii 的成功工程文件
- ¥15 笔记本外接显示器正常,但是笔记本屏幕黑屏
- ¥15 Python pandas
- ¥15 蓝牙硬件,可以用哪几种方法控制手机点击和滑动
- ¥15 生物医学数据分析。基础课程就v经常唱课程舅成牛逼
- ¥15 云环境云开发云函数对接微信商户中的分账功能
- ¥15 空间转录组CRAD遇到问题
- ¥20 materialstudio计算氢键脚本问题
- ¥15 有没有代做有偿主要做数据可视化部分即可(2023全国高考更省一本线理科类)