今天学习写爬虫,利用正则表达式爬取的一段网页源代码,代码前面有json.parse
利用正则表达式把gallery一行爬出来是下面的代码:
{\"count\":8,\"sub_images\":[{\"url\":\"http:\/\/p99.pstatp.com\/origin\/pgc-image\/154088560091068452d3c58\",\"width\":1080,\"url_list\":[{\"url\":\"http:\/\/p99.pstatp.com\/origin\/pgc-image\/154088560091068452d3c58\"},{\"url\":\"http:\/\/pb3.pstatp.com\/origin\/pgc-image\/154088560091068452d3c58\"},{\"url\":\"http:\/\/pb1.pstatp.com\/origin\/pgc-image\/154088560091068452d3c58\"}],\"uri\":\"origin\/pgc-image\/154088560091068452d3c58\",\"height\":1918},{\"url\":\"http:\/\/p1.pstatp.com\/origin\/pgc-image\/1540885587029ea96e1c851\",\"width\":690,\"url_list\":[{\"url\":\"http:\/\/p1.pstatp.com\/origin\/pgc-image\/1540885587029ea96e1c851\"},{\"url\":\"http:\/\/pb3.pstatp.com\/origin\/pgc-image\/1540885587029ea96e1c851\"},{\"url\":\"http:\/\/pb9.pstatp.com\/origin\/pgc-image\/1540885587029ea96e1c851\"}],\"uri\":\"origin\/
...........
将其json.loads()之后报错如下:
json.decoder.JSONDecodeError: Expecting property name enclosed in double quotes: line 1 column 2 (char 1)
网上试了HTMLParse,结果后面循环报错,解决不了了。哈哈哈,
有没有大神知道这种情况,怎么处理么?