关于python网络爬虫多线程下载图片到本地的问题

小弟最近在做网络爬虫，已经开了三贴了哈哈哈。这次的问题是我想在下载图片到本地时进行多线程下载以提高速度，但是我写的多线程每次都会回到程序最开始处，也就是让输入爬虫网站和深度的那里，有大佬能解答下吗

import time
import re
import os
import requests
from multiprocessing.pool import Pool
from multiprocessing import cpu_count
start_time=time.time()
url_website=input('Please type the URL:')
deep_number=input("Please specify the deep you want to reach: ")
html_name='http://'
link_list=[url_website]
list=[]

def split_website(url_website):
    re_website=re.findall('//.*',url_website)
    string_website="".join(re_website)
    path_website=re.sub('//','',string_website)
    return path_website

host_name=split_website(url_website)
host_name_list=host_name.split('/')
host_name=host_name_list[0]
deep=int(deep_number)

def save_image(iter,list_split):
    iter = "http://" + list_split[0] + iter
    im_string = ''.join(iter)
    im_list = im_string.split('/')
    im_name = im_list[-1]
    print(im_name)
    exc = False
    try:
        imgs = requests.get(iter)
    except:
        exc = True
        pass
    if not exc:
        print('write')
        image_file = open(im_name, 'wb')
        image_file.write(imgs.content)
        image_file.close()

while deep>=0:
    print(deep)
    print(link_list,'before foor loop')
    for element in link_list:
        print(element)
        res=requests.get(element)
        html_process=open('html_test.html','wb')
        html_process.write(res.content)
        html_process.close()
        html_read=open('html_test.html','r',encoding='UTF-8')
        read_content=html_read.read()
        urls=re.findall("<a.*?href=.*?<\/a>",read_content)
        print(urls)
        image = re.findall('img.*?src="(.+?)"',read_content)
        print(image)
        path_website = split_website(element)
        split_list = path_website.split('/')
        os.chdir(os.path.split(os.path.realpath(__file__))[0])
        print(link_list,'before 2 foor loop')
        for i in range(len(split_list)):
            dir_name = split_list[i]
            folder_name = dir_name
            if not os.path.exists(folder_name):
                os.mkdir(folder_name)
            os.chdir(folder_name)
            if i == (len(split_list) - 1):
##                _** for im_iter in image:
##                     pool=Pool(5)
##                     pool.map(save_image,[im_iter,split_list])
##                     pool.close()_**

        print(link_list,'before 3 for loop')
        for url in urls:
            url_string="".join(url)
            url_href_list=url_string.split("\"")
            url_href_list[1]=html_name+host_name+url_href_list[1]
            nick_name = re.findall('>.*?<', url)
            if (''.join(nick_name))!='>Back<':
                list.append(url_href_list[1])
                print(list,'this is back up list')
        print(link_list,'Before removing')
        print(link_list,'After removing')
        print(list)

    link_list=list
    list=[]
    print(deep)
    deep=deep-1
end_time=time.time()
print('time used: ',end_time-start_time)

加粗斜体那是小弟写的多线程，但奇怪的是每次它都会回到最开始叫我输入网址的地方并且出现5次。如何避免这个问题只让下图片那多线程呢，求大佬解答
执行开始时
执行一段时间后

写回答
好问题 0 提建议
追加酬金
关注问题
分享
邀请回答
编辑收藏删除结题
收藏举报

2条回答默认最新

关注

码龄粉丝数原力等级 --

被采纳

被点赞

采纳率
bobhuang 2019-11-18 17:58
关注
pool map的使用参考这篇：https://blog.csdn.net/weixin_36637463/article/details/86496763

对应到你的程序，如果用save_image作为pool.map的第一个参数，那么第二个参数可以是一个url的list。相应的，save_image的参数只能是一个，类型是单个的url.

本回答被题主选为最佳回答 , 对您是否有帮助呢?

解决无用
评论打赏
分享
举报

评论

按下Enter换行，Ctrl+Enter发表内容

查看更多回答(1条)

报告相同问题？

关注问题

关于python网络爬虫多线程下载图片到本地的问题 python
2019-11-18 17:47

回答 2 已采纳 pool map的使用参考这篇：[https://blog.csdn.net/weixin_36637463/article/details/86496763](https://blog.csdn.n
关于python 爬虫项目 多线程的问题！ python 有问必答爬虫
2022-04-09 22:19

回答 2 已采纳用pypy,namba（需重构，加装饰器）,cython（需重构，显式声明类型）用正则表达式更快，但适用性会降低灵活运用异步减少中间值的使用少用for循环控制线程数量，考虑协程、多进程，因为有GIL的
python爬虫怎么改成多线程 python 爬虫
2022-04-29 12:58

回答 6 已采纳楼上的这些线程改造方式放在这里行不通吧。楼主代码中IO操作在循环的地方，这里线程不是应该将循环改成多线程吗。。。##改了一个循环，试一下 ```python import parsel impor
Python之多线程爬虫抓取网页图片的示例代码
2020-12-25 12:50

那么请使用python语言，构建一个抓取和下载网页图片的爬虫。当然为了提高效率，我们同时采用多线程并行方式。思路分析 Python有很多的第三方库，可以帮助我们实现各种各样的功能。问题在于，我们弄清楚我们需要...
python多线程下载+IP代理问题 python 爬虫
2023-02-08 13:16

回答 3 已采纳十分感谢，我已经解决问题了，原因是部分ip代理无效导致下载的文件损坏
python多线程爬虫如何在中断后继续上次的断点下载数据 python sql 数据挖掘
2021-06-03 15:53

回答 3 已采纳 redis记没成功的不就完事了，带同步访问，doge
python异步协程和多线程问题 python 有问必答爬虫
2021-11-25 21:18

回答 1 已采纳 aiohttp的高并发用了协程，而request+线程只是多线程，这个不一样。理论上是aiohttp速度比request+线程的快，但爬虫太快很容易被反爬
python爬虫之多线程、多进程爬虫
2021-02-24 07:23

多线程对爬虫的效率提高是非凡的，当我们使用python的多线程有几点是需要我们知道的：1.Python的多线程并不如java的多线程，其差异在于当python解释器开始执行任务时，受制于GIL(全局解释所)，Python的线程被限制到...
python tkinter多线程问题 python
2022-09-06 23:39

回答 2 已采纳通常，点击按钮起一个线程的话，会在线程启动后禁用该按钮，线程结束后取消按钮禁用。根据你的需求，我写了一个例子，仅供参考。有兴趣的话，也可以参考我的这篇博客： Tkinte
多线程爬虫加锁的问题 python 爬虫
2022-11-02 16:22

回答 1 已采纳我觉得可以给这个visited 用原子数组包装下，但这只保证了原子性，还需要加个volatile 保证可见性。
python爬图片，代码没报错但是没有结果 python
2022-06-29 05:32

回答 2 已采纳直接运行这个脚本文件的话，图片并不是存在桌面上。而是在这个py文件同目录下，即“C:/Users/AUB/Desktop/Final Year/Python 练习/p54爬虫”
python3爬虫中多线程的优势总结
2021-01-19 23:22

有些小伙伴跟小编讨论了python中使用多线程原理的问题，就聊到了关于python多线程的弊端问题，这点可能在使用的过程中大家会能感觉到。而且之前讲过的GIL也是对python多线程的一种限制。那么，我们为什么还要用多...
python爬虫爬取数据存储进数据库的问题 flask mysql python
2019-04-22 20:43

回答 3 已采纳首先，我建议你主键最好不要设在userId上。让ID自增长。然后把userId作为一个普通的字段。其次，我个人建议也不要设置外键关系。你可以通过把某一张表的ID放到另一张表里作为关联，但是不建议设
Python爬虫进阶之多线程爬取数据并保存到数据库
2020-12-21 21:54

今天刚看完崔大佬的《python3网络爬虫开发实战》，顿时觉得自己有行了，准备用appium登录QQ爬取列表中好友信息，接踵而来的是一步一步的坑，前期配置无数出错，安装之后连接也是好多错误，把这些错误解决之后，找APP...
没有解决我的问题, 去提问

悬赏问题

¥20 有关区间dp的问题求解
¥15 多电路系统共用电源的串扰问题
¥15 slam rangenet++配置
¥15 有没有研究水声通信方面的帮我改俩matlab代码
¥15 对于相关问题的求解与代码
¥15 ubuntu子系统密码忘记
¥15 信号傅里叶变换在matlab上遇到的小问题请求帮助
¥15 保护模式-系统加载-段寄存器
¥15 电脑桌面设定一个区域禁止鼠标操作
¥15 求NPF226060磁芯的详细资料

关于python网络爬虫多线程下载图片到本地的问题

2条回答 默认 最新

悬赏问题

2条回答默认最新