qq_39842914
TJ Zhang
2019-03-26 15:07

爬虫无法运行,请大神帮忙看下

  • python
  • html5
  • 正则表达式

想爬取豆瓣读书的书籍的链接、名字、作者、出版日期,但是电脑一直没有反应,大家帮忙看看

import requests
import re
headers={
    'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.110 Safari/537.36'
        }
content=requests.get('https://book.douban.com/',headers=headers).text
#print(content)
print("-----------")
pattern=re.compile('<li.*?cover.*?href="(.*?)".*?title="(.*?)".*?more-meta.*?author">(.*?)</span>.*?year">(.*?)</span>.*?</li>',re.S)
results=re.findall(pattern,content)
print("-----------")
print(results)
for result in results:
    url,name,author,date=result
    author=re.sub('\s','',author)
    date=re.sub('\s','',date)
    print(url,name,author,date)
  • 点赞
  • 回答
  • 收藏
  • 复制链接分享

1条回答