du8794 2019-05-24 18:24
浏览 66

无法使用goquery收集Google搜索结果的所有节点:缺少某些节点

I am trying to collect results of a google search page in GoLang using the goquery library. In order to achieve this, I am collecting all nodes of a goquery selection with goquery. The problem is that the selection returned by Find("*") does not seem to contain all the nodes of the HTML document. Question: does the method collect ALL nodes with the whole tree structure or not ? If not, is there a method to collect them all ?

I tried using the goquery Find("*") method applied to the whole document selection. So nodes with certain attributes are not returned, although they are in the HTML document. For instance, nodes with are not recognized

alltags := doc.Find("*") //doc is the HTML doc with the Google search

The selection does not contain the div tags with class="srg". The same applies to other class values such as "bkWMgd", "rc" for example.

  • 写回答

1条回答 默认 最新

  • dsfdfdfd6576578 2019-05-24 18:46
    关注

    This has happened to me before. I was trying to web scrape with python beautiful soup package and the same thing was happening.

    Later it turned out that the html markup returned when trying to fetch it was actually the markup the server returned after finding a bot. I solved this by setting the User-Agent to Mozilla/5.0.

    Hope this helps in your quest to solve this. You can start by updating the code for the fetch request you have performed.

    评论

报告相同问题?

悬赏问题

  • ¥15 Oracle中如何从clob类型截取特定字符串后面的字符
  • ¥15 想通过pywinauto自动电机应用程序按钮,但是找不到应用程序按钮信息
  • ¥15 如何在炒股软件中,爬到我想看的日k线
  • ¥15 seatunnel 怎么配置Elasticsearch
  • ¥15 PSCAD安装问题 ERROR: Visual Studio 2013, 2015, 2017 or 2019 is not found in the system.
  • ¥15 (标签-MATLAB|关键词-多址)
  • ¥15 关于#MATLAB#的问题,如何解决?(相关搜索:信噪比,系统容量)
  • ¥500 52810做蓝牙接受端
  • ¥15 基于PLC的三轴机械手程序
  • ¥15 多址通信方式的抗噪声性能和系统容量对比