1、第一个是最后一行代码提示错误,请问下如何修改?
Traceback (most recent call last):
File "C:\Users\Administrator\Desktop\5.10.py", line 34, in
fp.write('\r\n')
TypeError: a bytes-like object is required, not 'str'
2、第二个问题是写入后只有最后一个文件内容,是不是写入的时候直接把前面写入的文件覆盖了,我把with open(fname, 'wb') as fp: 改成
with open(fname, 'a') as fp,提示如下:
TypeError: write() argument must be str, not bytes
请问如何不把前面的内容覆盖?
3、写入的文件内容,没有换行,是不是第一段代码就是换行的?
4、获取到的只有内容,所有的标题都没有,请问下怎么把标题也获取到?
#coding: utf-8
import requests
from lxml import etree
url = 'http://www.moe.gov.cn/jyb_xxgk/moe_1777/moe_1778/'
headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64)\
AppleWebKit/537.36 (KHTML, like Gecko) Chrome/52.0\
.2743.116 Safari/537.36',
'Accept-Language': 'zh-CN,zh;q=0.8'
}
response = requests.get(url, headers=headers).text
html = etree.HTML(response)
result1 = html.xpath('//ul[@id="list"]//li//a/@href')
for site in result1:
xurl = "http://www.moe.gov.cn/jyb_xxgk/moe_1777/moe_1778/" + site
req = requests.get(xurl, headers=headers)
html2 = etree.HTML(req.content)
result2 = html2.xpath('//p/text()')
fname = r"C:\Users\Administrator\Desktop\1234.docx"
with open(fname, 'wb') as fp:
for i in result2:
fp.write(i.encode('utf-8'))
fp.write('\r\n')