weixin_40148208
weixin_40148208
2017-09-12 12:26

Python爬虫 急求大神帮忙 万分感谢

  • 爬虫
  • 网页 爬虫
  • python
  • xpath 抓取

我用Python爬虫 爬取网页 爬出来获取特定数据 为什么大部分显示null 而且每次执行一次 数据就会增加一些 到底是代码问题还是网页问题 求大神指点代码下:
def get_content(self, html): #获取一个网页的内容
div_list = html.xpath("//div[contains(@class,'listtyle')]")
item_list = []
for div in div_list:
for b in range(1,19):
food_img= div.xpath("./div[@class='listtyle1'][b]/a[@class='big']/img[@class='img']/@src")
food_img=food_img[0] if len(food_img)>0 else None

            food_name = div.xpath("./div[@class='listtyle1'][b]/a[@class='big']/div[@class='i_w']/div[@class='i']/div[@class='c1']/strong/text()")
            food_name = food_name[0] if len(food_name)>0 else None
            food_effect=div.xpath("./div[@class='listtyle1'][b]/a[@class='big']/strong[@class='gx']/span/text()")
            food_effect = food_effect[0]  if len(food_effect)>0 else None
            food_time=div.xpath("./div[@class='listtyle1'][b]/a[@class='big']/div[@class='i_w']/div[@class='i']/div[@class='c2']/ul/li[@class='li1']/text()")                
            food_time = food_time[0] if len(food_time)>0 else None
            food_taste=div.xpath("./div[@class='listtyle1'][b]/a[@class='big']/div[@class='i_w']/div[@class='i']/div[@class='c2']/ul/li[@class='li2']/text()")
            food_taste = food_taste[0] if len(food_taste)>0 else None
            food_commentnum_likenum=div.xpath("./div[@class='listtyle1'][b]/a[@class='big']/div[@class='i_w']/div[@class='i']/div[@class='c1']/span/text()")
            food_commentnum_likenum = food_commentnum_likenum[0] if len(food_commentnum_likenum)>0 else None


            item=dict(

                food_img1=food_img,
                food_name1=food_name,
                food_effect1=food_effect,
                food_time1=food_time,
                food_taste1=food_taste,
                food_commentnum_likenum1=food_commentnum_likenum,
            )
            item_list.append(item)
    return item_list                
  • 点赞
  • 回答
  • 收藏
  • 复制链接分享

2条回答

为你推荐

换一换