doulu5109 2011-04-12 10:46
浏览 15

Preg_match(_all)无法从Google收集一些数据

I'm creating a tool for my websites to see what position they are in Google on different keywords.

Now, I want to collect this part of their sourcecode:

<a href="http://www.test.com/" class=l onmousedown="return clk(this.href,'','','','1','','0CBoQFjAA')">Linktitle in Google!</a>

The problem is that the preg_match OR preg_match_all function doesn't match "onmousedown" or "this.href" or the ,'1' part of the link. And that is exactly the part i need...

Does anyone has an idea why this is, and more important.. how to solve this???

The code I use is obvious.. i even tried to use "/onmousedown/" or "/\'1\'/" but it didn't help.

Thank you very much!!!!

  • 写回答

4条回答 默认 最新

  • douchen4547 2011-04-12 10:49
    关注

    Besides the ethical and possible legal implications of scraping Google, you should not be using regular expressions to extract portions of HTML. Regular expressions were not designed to parse HTML and are not equipped for the specific grammar.

    Try using a HTML parser, such as DOMDocument. It was designed to parse HTML/XML.

    评论

报告相同问题?

悬赏问题

  • ¥15 关于#matlab#的问题:在模糊控制器中选出线路信息,在simulink中根据线路信息生成速度时间目标曲线(初速度为20m/s,15秒后减为0的速度时间图像)我想问线路信息是什么
  • ¥15 banner广告展示设置多少时间不怎么会消耗用户价值
  • ¥16 mybatis的代理对象无法通过@Autowired装填
  • ¥15 可见光定位matlab仿真
  • ¥15 arduino 四自由度机械臂
  • ¥15 wordpress 产品图片 GIF 没法显示
  • ¥15 求三国群英传pl国战时间的修改方法
  • ¥15 matlab代码代写,需写出详细代码,代价私
  • ¥15 ROS系统搭建请教(跨境电商用途)
  • ¥15 AIC3204的示例代码有吗,想用AIC3204测量血氧,找不到相关的代码。