刚开始学爬虫,代码都是跟着视频学的,想爬取一个网页代码,但是结果显示的是网络不给力,救命!
代码如下:
# 1、导入 urllib.request urllib.parse
import urllib.request
import urllib.parse
# 2、准备一个基础网址 https://www.baidu.com/s?
base_url = 'https://www.baidu.com/s?'
# 3、准备要查询的内容,用字典储存
data = {
'wd': '明星',
'sex': '女',
'location': '新疆'
}
# 4、使用urlencode方法拼接字典的内联,然后将中文转为encode编码
new_data = urllib.parse.urlencode(data)
# 5、拼接 基础网址、和转化后的查询内容
url = base_url + new_data
print(url)
# 6、反爬
headers = {
'User-Agent':
'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/114.0.0.0 Safari/537.36',
}
# 7、请求对象定制
request = urllib.request.Request(url=url, headers=headers)
# 8、模拟浏览器向服务器发送请求
response = urllib.request.urlopen(request)
# 9、获取网页源码的数据
content = response.read().decode('utf-8')
print("打印结果:", content)
结果是这样的:
"F:\Program files\pycharm\python310\python.exe" "E:/Python/python exercise/06-urllib_get的urlencode方法.py"
https://www.baidu.com/s?wd=%E6%98%8E%E6%98%9F&sex=%E5%A5%B3&location=%E6%96%B0%E7%96%86
打印结果: <!DOCTYPE html>
<html lang="zh-CN">
<head>
<meta charset="utf-8">
<title>百度安全验证</title>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
<meta name="apple-mobile-web-app-capable" content="yes">
<meta name="apple-mobile-web-app-status-bar-style" content="black">
<meta name="viewport" content="width=device-width, user-scalable=no, initial-scale=1.0, minimum-scale=1.0, maximum-scale=1.0">
<meta name="format-detection" content="telephone=no, email=no">
<link rel="shortcut icon" href="https://www.baidu.com/favicon.ico" type="image/x-icon">
<link rel="icon" sizes="any" mask href="https://www.baidu.com/img/baidu.svg">
<meta http-equiv="X-UA-Compatible" content="IE=Edge">
<meta http-equiv="Content-Security-Policy" content="upgrade-insecure-requests">
<link rel="stylesheet" href="https://ppui-static-wap.cdn.bcebos.com/static/touch/css/api/mkdjump_aac6df1.css" />
</head>
<body>
<div class="timeout hide-callback">
<div class="timeout-img"></div>
<div class="timeout-title">
> 网络不给力,请稍后重试</div>
<button type="button" class="timeout-button">返回首页</button>
</div>
<div class="timeout-feedback hide-callback">
<div class="timeout-feedback-icon"></div>
<p class="timeout-feedback-title">问题反馈</p>
</div>
<script src="https://ppui-static-wap.cdn.bcebos.com/static/touch/js/mkdjump_v2_21d1ae1.js"></script>
</body>
</html>
Process finished with exit code 0