ds34222 2018-05-21 19:54
浏览 51
已采纳

正则表达式选择特定的html元素[Curl / PHP]

I am trying to scrape some specific data and output them in my site.

what i want to extract-

Im using Curl in PHP and this is the regular expression im trying to use but it gives me an error Fatal error: Allowed memory size of ram bytes exhausted which means it takes lot of files.

code:

preg_match_all('!<th scope="(\b[a-zA-Z]+\b)">(\b[a-zA-Z]+\b)<\/th><td><a href="\/wiki\/(\b[a-zA-Z]+\b)" title="(\b[a-zA-Z]+\b)">(\b[a-zA-Z]+\b)<\/a>!',$result,$cap_matches);
$cap_name = array_values(array_unique($cap_matches[0]));
echo $cap_name[0];

ive tried to make regular expression only the "a ..." tag but i get lot of results back, i just want to grab the capital.

  • 写回答

1条回答 默认 最新

  • 红酒泡绿茶 2018-05-21 22:16
    关注

    do not parse HTML with regex. use a proper HTML parser instead, like DOMDocument.

    $domd = @DOMDocument::loadHTML ( $result );
    unset($result);
    $xp = new DOMXPath ( $domd );
    $capital = $xp->query ( '//th[text()="Capital"]/following-sibling::td/a' )->item ( 0 )->getAttribute("title");
    unset($domd,$xp);
    var_dump ( $capital );
    

    as for avoiding OOM errors, try wrapping your most memory hungry operations in smaller functions, letting the garbage collector clean everything on function exit, or unset() your big variables asap when they're no longer needed.. (i wouldn't normally use unset() in the code above, but since you were specifically complaining about OOM errors, i did). another obvious solution is to increase the memory limit, eg

    if(false===ini_set("memory_limit","1G")){
        throw new \RuntimeException('error, unable to change memory limit!');
    };
    

    should set the memory limit to 1 gigabyte, up from the default 128 megabytes.

    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论

报告相同问题?

悬赏问题

  • ¥15 fx2n系列plc的自控成型机模拟
  • ¥15 时间序列LSTM模型归回预测代码问题
  • ¥50 使用CUDA如何高效的做并行化处理,是否可以多个分段同时进行匹配计算处理?目前数据传输速度有些慢,如何提高速度,使用gdrcopy是否可行?请给出具体意见。
  • ¥15 基于STM32,电机驱动模块为L298N,四路运放电磁传感器,三轮智能小车电磁组电磁循迹(两个电机,一个万向轮),如何通过环岛的原理及完整代码
  • ¥20 机器学习或深度学习问题?困扰了我一个世纪,晚来天欲雪,能饮一杯无?
  • ¥15 c语言数据结构高铁订票系统
  • ¥15 关于wkernell.PDB加载的问题,如何解决?(语言-c#|开发工具-vscode)
  • ¥100 某宝多次访问被拒绝,求解
  • ¥15 (标签-STM32|关键词-智能小车)
  • ¥20 关于#stm32#的问题,请各位专家解答!