就只问一下用python爬虫(其他python方法也行)获取这两个网站的HTML(超文本标记语言就是<html>...<head>...<html>)怎么做?我试过普通方法行不通
- 写回答
- 好问题 0 提建议
- 追加酬金
- 关注问题
- 邀请回答
-
1条回答 默认 最新
- threenewbee 2020-02-27 21:02关注
这两个网站没有什么特别,唯一就是境外,建议你先建立科学的连接,然后访问
import requests res = requests.get('https://pypi.org/search/?c=Programming+Language+%3A%3A+Python+%3A%3A+3') res.encoding = 'utf-8' print(res.text)
源代码太多了,我这里各贴出前100行
<!DOCTYPE html> <html lang="en"> <head> <meta charset="utf-8"> <meta http-equiv="X-UA-Compatible" content="IE=edge"> <meta name="viewport" content="width=device-width, initial-scale=1"> <meta name="defaultLanguage" content="en"> <meta name="availableLanguages" content="en, es, fr, ja, pt_BR, uk, el, de"> <title>Search results · PyPI</title> <meta name="description" content="The Python Package Index (PyPI) is a repository of software for the Python programming language."> <link rel="stylesheet" href="/static/css/warehouse.26fd4b09.css"> <link rel="stylesheet" href="/static/css/fontawesome.91df071f.css"> <link rel="stylesheet" href="/static/css/regular.8819f1a9.css"> <link rel="stylesheet" href="/static/css/solid.002489ee.css"> <link rel="stylesheet" href="/static/css/brands.0c9eb08b.css"> <link rel="stylesheet" href="https://fonts.googleapis.com/css?family=Source+Sans+Pro:400,400italic,600,600italic,700,700italic%7CSource+Code+Pro:500"> <noscript> <link rel="stylesheet" href="/static/css/noscript.69d08c82.css"> </noscript> <link rel="icon" href="/static/images/favicon.6a76275d.ico" type="image/x-icon"> <link rel="alternate" type="application/rss+xml" title="RSS: 40 latest updates" href="/rss/updates.xml"> <link rel="alternate" type="application/rss+xml" title="RSS: 40 newest packages" href="/rss/packages.xml"> <meta property="og:url" content="https://pypi.org/search/?c=Programming+Language+%3A%3A+Python+%3A%3A+3"> <meta property="og:site_name" content="PyPI"> <meta property="og:type" content="website"> <meta property="og:image" content="https://pypi.org/static/images/twitter.c0030826.jpg"> <meta property="og:title" content="Search results"> <meta property="og:description" content="The Python Package Index (PyPI) is a repository of software for the Python programming language."> <link rel="search" type="application/opensearchdescription+xml" title="PyPI" href="/opensearch.xml"> <script src="https://cdn.ravenjs.com/3.26.2/raven.min.js" integrity="sha384-D6LXy67EIC102DTuqypxwQsTHgiatlbvg7q/1YAWFb6lRyZ1lIZ6bGDsX7jxHNKA" crossorigin="anonymous"> </script> <script async data-ga-id="UA-55961911-1" data-sentry-frontend-dsn="https://3a67b35c9dc248a191d761410b095861@sentry.io/1231155" src="/static/js/warehouse.092c6255.js"> </script> <script async src="https://www.googletagmanager.com/gtag/js?id=UA-55961911-1"></script> <script defer src="https://www.fastly-insights.com/insights.js?k=6a52360a-f306-421e-8ed5-7417d0d4a4e9&dnt=true"></script> </head> <body data-controller="viewport-toggle"> <!-- Accessibility: this link should always be the first piece of content inside the body--> <a href="#content" class="skip-to-content">Skip to main content</a> <button type="button" class="button button--primary button--switch-to-mobile hidden" data-target="viewport-toggle.switchToMobile" data-action="viewport-toggle#switchToMobile"> Switch to mobile version </button> <div id="sticky-notifications" class="stick-to-top js-stick-to-top"> <!-- Add browser warning. Will show for ie9 and below --> <!--[if IE]> <div class="notification-bar notification-bar--warning" role="status"> <span class="notification-bar__icon"> <i class="fa fa-exclamation-triangle" aria-hidden="true"></i> <span class="sr-only">Warning</span> </span> <span class="notification-bar__message">You are using an unsupported browser, upgrade to a newer version.</span> </div> <![endif]--> <noscript> <div class="notification-bar notification-bar--warning" role="status"> <span class="notification-bar__icon"> <i class="fa fa-exclamation-triangle" aria-hidden="true"></i> <span class="sr-only">Warning</span> </span> <span class="notification-bar__message">Some features may not work without JavaScript. Please try enabling it if you encounter problems.</span> </div> </noscript> </div> <div data-html-include="/_includes/flash-messages/"> </div>
<!DOCTYPE html> <html lang="en"> <head> <meta charset="utf-8"> <meta http-equiv="X-UA-Compatible" content="IE=edge"> <meta name="viewport" content="width=device-width, initial-scale=1"> <meta name="defaultLanguage" content="en"> <meta name="availableLanguages" content="en, es, fr, ja, pt_BR, uk, el, de"> <title>qface-qtcpp · PyPI</title> <meta name="description" content="Qt CPP generator based on the QFace library"> <link rel="stylesheet" href="/static/css/warehouse.26fd4b09.css"> <link rel="stylesheet" href="/static/css/fontawesome.91df071f.css"> <link rel="stylesheet" href="/static/css/regular.8819f1a9.css"> <link rel="stylesheet" href="/static/css/solid.002489ee.css"> <link rel="stylesheet" href="/static/css/brands.0c9eb08b.css"> <link rel="stylesheet" href="https://fonts.googleapis.com/css?family=Source+Sans+Pro:400,400italic,600,600italic,700,700italic%7CSource+Code+Pro:500"> <noscript> <link rel="stylesheet" href="/static/css/noscript.69d08c82.css"> </noscript> <link rel="icon" href="/static/images/favicon.6a76275d.ico" type="image/x-icon"> <link rel="alternate" type="application/rss+xml" title="RSS: 40 latest updates" href="/rss/updates.xml"> <link rel="alternate" type="application/rss+xml" title="RSS: 40 newest packages" href="/rss/packages.xml"> <link rel="canonical" href="https://pypi.org/project/qface-qtcpp/"> <meta property="og:url" content="https://pypi.org/project/qface-qtcpp/"> <meta property="og:site_name" content="PyPI"> <meta property="og:type" content="website"> <meta property="og:image" content="https://pypi.org/static/images/twitter.c0030826.jpg"> <meta property="og:title" content="qface-qtcpp"> <meta property="og:description" content="Qt CPP generator based on the QFace library"> <link rel="search" type="application/opensearchdescription+xml" title="PyPI" href="/opensearch.xml"> <script src="https://cdn.ravenjs.com/3.26.2/raven.min.js" integrity="sha384-D6LXy67EIC102DTuqypxwQsTHgiatlbvg7q/1YAWFb6lRyZ1lIZ6bGDsX7jxHNKA" crossorigin="anonymous"> </script> <script async data-ga-id="UA-55961911-1" data-sentry-frontend-dsn="https://3a67b35c9dc248a191d761410b095861@sentry.io/1231155" src="/static/js/warehouse.092c6255.js"> </script> <script async src="https://www.googletagmanager.com/gtag/js?id=UA-55961911-1"></script> <script defer src="https://www.fastly-insights.com/insights.js?k=6a52360a-f306-421e-8ed5-7417d0d4a4e9&dnt=true"></script> </head> <body data-controller="viewport-toggle"> <!-- Accessibility: this link should always be the first piece of content inside the body--> <a href="#content" class="skip-to-content">Skip to main content</a> <button type="button" class="button button--primary button--switch-to-mobile hidden" data-target="viewport-toggle.switchToMobile" data-action="viewport-toggle#switchToMobile"> Switch to mobile version </button> <div id="sticky-notifications" class="stick-to-top js-stick-to-top"> <!-- Add browser warning. Will show for ie9 and below --> <!--[if IE]> <div class="notification-bar notification-bar--warning" role="status"> <span class="notification-bar__icon"> <i class="fa fa-exclamation-triangle" aria-hidden="true"></i> <span class="sr-only">Warning</span> </span> <span class="notification-bar__message">You are using an unsupported browser, upgrade to a newer version.</span> </div> <![endif]--> <noscript> <div class="notification-bar notification-bar--warning" role="status"> <span class="notification-bar__icon"> <i class="fa fa-exclamation-triangle" aria-hidden="true"></i> <span class="sr-only">Warning</span> </span> <span class="notification-bar__message">Some features may not work without JavaScript. Please try enabling it if you encounter problems.</span> </div> </noscript> </div>
本回答被题主选为最佳回答 , 对您是否有帮助呢?解决 无用评论 打赏 举报
悬赏问题
- ¥15 根据以下文字信息,做EA模型图
- ¥15 删除虚拟显示器驱动 删除所有 Xorg 配置文件 删除显示器缓存文件 重启系统 可是依旧无法退出虚拟显示器
- ¥15 vscode程序一直报同样的错,如何解决?
- ¥15 关于使用unity中遇到的问题
- ¥15 开放世界如何写线性关卡的用例(类似原神)
- ¥15 关于并联谐振电磁感应加热
- ¥60 请查询全国几个煤炭大省近十年的煤炭铁路及公路的货物周转量
- ¥15 请帮我看看我这道c语言题到底漏了哪种情况吧!
- ¥60 关机时蓝屏并显示KMODE_EXCEPTION_NOT_HANDLED,怎么修?
- ¥66 如何制作支付宝扫码跳转到发红包界面