下载了网页的har文件后,想要进行解析
目标是获取所有以 'https://m.douyin.com/web/api/v2/aweme/post/?reflow_source=reflow_page&sec_uid=%27 开头的url
利用re.findall,为什么获取的是空值,代码哪里写错了,应该改成什么样?
path = 'C:/Users/cuiha/Desktop/小傻子/中国器官移植发展基金会.har'
with open(path, 'r', encoding='UTF-8') as readObj:
harDirct = json.loads(readObj.read())
# print(harDirct)
str_harDirct = str(harDirct)
pattern = 'https://m.douyin.com/web/api/v2/aweme/post/?reflow_source=reflow_page&sec_uid='
urls = re.findall(pattern, str_harDirct)
print(urls)