这是我一步一步的注释然后用print尝试输出找到出问题的位置但是就是不懂是为什么,这个
小程序最后能运行,但是只能在txt文件里面写50章内容,然后就会报错,网上推到下面这一段代码出问题
chapter_content = re.findall(r'style5\(\);</script>(.*?);<script type="text/', chapter_html)
这里可以取到每一章小说文本总共一千多章小说都可以取到,然而数据处理之后是放在 [] 列
表里面,但是下一步需要用replace替换里面的不需要的各种符号,但是replace()不可以对列表
用,就需要把列表里面的东西都取出来,网上的教程就是在上面那段代码最后加上[0]就可以取
出来了,但是取出来之前可以全部内容都拿到,加上[0]之后就可以用replace()替换,但是就只能取到50章就报错了
错误提示
```Traceback (most recent call last):
File "D:/Python/untitled/爬虫3.py", line 31, in
chapter_content = re.findall(r'style5();(.*?);<script type="text/', chapter_html)[0]
IndexError: list index out of range
```Traceback (most recent call last):
## ## ## File "D:/Python/untitled/爬虫3.py", line 31, in <module>
## chapter_content = re.findall(r'style5\(\);</script>(.*?);<script type="text/', chapter_html)[0]
## IndexError: list index out of range```
##
##
##
##
##
##
##
##
## #!/user/bin/env python
## # _*_ coding:utf-8 _*_
## import requests
## import re
## url = 'http://www.quanshuwang.com/book/106/106281'
## # 发送http请求
## response = requests.get(url)
## response.encoding = 'gbk'
## html = response.text
## # print(html)
## title = re.findall(r'</span><strong>(.*?)</strong>', html)[0]
## # print(title)[0]
##
## dl = re.findall(r'<DIV class="clearfix dirconone">(.*?)</DIV> ', html, re.S)[0]
## chapter_info_list = re.findall(r'<li><a href="(.*?)" title=".*?">(.*?)</a></li>', dl)
##
## fb = open('%s.txt' % title, 'w', encoding='utf-8') # 中文内容encoding='utf-8'
##
## for chapter_info in chapter_info_list:
## chapter_title = chapter_info[1]
## chapter_url = chapter_info[0]
## # print(chapter_title, chapter_url)
##
## chapter_response = requests.get(chapter_url)
## chapter_response.encoding = 'gbk'
## chapter_html = chapter_response.text
## # print(chapter_html)
## # 提取数据
## chapter_content = re.findall(r'style5\(\);</script>(.*?);<script type="text/', chapter_html)
## print(chapter_content)
##
##
##
##
## # chapter_content = chapter_content.replace(' ', '')
## # chapter_content = chapter_content.replace('<br />', '')
##
## # print(chapter_content)
##
## # 7写入文件
## # with open('%s.txt' % title, "w", encoding='utf-8') as fb:
## # fb.write(chapter_title)
## # fb.write(chapter_content)
## # fb.write('\n')
##
## # 方法二
## # fb.write(chapter_title)
## # fb.write(chapter_content)
## # fb.write('\n')
## print('正在下载%s' % chapter_title)
## # fb.closed()