问题遇到的现象和发生背景
有如下爬取网上股票数据的代码,经天元浪子 提示这个属于多进程运行。请问在这个程序里,多线程和多进程哪个更高效合适?因为要获取全部股票数据有五千个,用哪个最能节省时间?另外我想显示完成进度(已完成数/总数),应该怎样加代码?不胜感激。
问题相关代码,请勿粘贴截图
import pandas as pd
import os
from multiprocessing.pool import Pool
def updateStock_pool_per(arg_list):
download_url = arg_list[0]
stock_path = arg_list[1]
stock_code = arg_list[2]
df = pd.read_csv(download_url, encoding='gbk') # 直接将网上的文件数据读取下来
path = os.path.join(stock_path, stock_code + '.csv')
df.to_csv(path, index=False, encoding='gbk') # 保存到文件
return stock_code
def updateStock(stock_path, stock_code_list):
arg_list = []
for stock_code in stock_code_list:
download_url = 'http://quotes.money.163.com/service/chddata.html?code=' + stock_code + '&start=20220719&end=20220819&fields=TCLOSE;HIGH;LOW;TOPEN;LCLOSE;VOTURNOVER;VATURNOVER;TCAP;MCAP' # 构造url
list_ = [download_url, stock_path, stock_code]
arg_list.append(list_)
# pool 每组参数只有一个,多参数封装list中list传入
with Pool(processes=4) as pool:
update_success_list = pool.map(updateStock_pool_per, arg_list)
print('*' * 50)
update_all_count = len(stock_code_list) # 一共需要更新的股票数量
update_success_count = len(update_success_list)
update_fail_list = []
for m in stock_code_list:
if m not in update_success_list:
update_fail_list.append(m)
update_fail_count = len(update_fail_list)
print('共准备更新 {} 支股票, {} 支股票增加数据, {} 支股票下载数据有错'.format(update_all_count, update_success_count, update_fail_count))
if update_fail_list:
print(f'下载以下股票时出错\n{update_fail_list}')
if __name__ == '__main__':
stock_path = r'C:\Users\Administrator\Desktop\stock'
stock_code_list = ['1002296', '1002364', '1000826', '0600981', '1000997', '1000901', '1300291', '1300002', '1002051', '1300252']
updateStock(stock_path, stock_code_list)
我想要达到的结果
1多线程和多进程哪个更快,给出代码。2显示进度。