weixin_43246525
Anciewal
采纳率100%
2018-10-30 10:06

python3 爬虫爬取不规则、带有转义符的json字符串,json.loads()报错

5
已采纳

今天学习写爬虫,利用正则表达式爬取的一段网页源代码,代码前面有json.parse
图片说明

利用正则表达式把gallery一行爬出来是下面的代码:
{\"count\":8,\"sub_images\":[{\"url\":\"http:\/\/p99.pstatp.com\/origin\/pgc-image\/154088560091068452d3c58\",\"width\":1080,\"url_list\":[{\"url\":\"http:\/\/p99.pstatp.com\/origin\/pgc-image\/154088560091068452d3c58\"},{\"url\":\"http:\/\/pb3.pstatp.com\/origin\/pgc-image\/154088560091068452d3c58\"},{\"url\":\"http:\/\/pb1.pstatp.com\/origin\/pgc-image\/154088560091068452d3c58\"}],\"uri\":\"origin\/pgc-image\/154088560091068452d3c58\",\"height\":1918},{\"url\":\"http:\/\/p1.pstatp.com\/origin\/pgc-image\/1540885587029ea96e1c851\",\"width\":690,\"url_list\":[{\"url\":\"http:\/\/p1.pstatp.com\/origin\/pgc-image\/1540885587029ea96e1c851\"},{\"url\":\"http:\/\/pb3.pstatp.com\/origin\/pgc-image\/1540885587029ea96e1c851\"},{\"url\":\"http:\/\/pb9.pstatp.com\/origin\/pgc-image\/1540885587029ea96e1c851\"}],\"uri\":\"origin\/
...........

将其json.loads()之后报错如下:
json.decoder.JSONDecodeError: Expecting property name enclosed in double quotes: line 1 column 2 (char 1)

网上试了HTMLParse,结果后面循环报错,解决不了了。哈哈哈,

有没有大神知道这种情况,怎么处理么?

  • 点赞
  • 写回答
  • 关注问题
  • 收藏
  • 复制链接分享
  • 邀请回答

1条回答