dsz90288 2016-04-24 13:11
浏览 117

如何从源文件中提取HTML元素

I need to replace a HTML section identified by a tag id in a source code, which is combination of HTML and PHP using PHP. In case it's pure HTML, DOM parser could be used; in case there is no DIV in DIV, I can imagine how to use preg_match. This is what I am trying to do - I have a code (loaded into a string) like:

<div>
  <img >
</div>

<? include(); ?>

<div id="mydiv">
   <div>
      <div>
        <img >
      </div>
   </div>
</div>

and my task is to replace content of "mydiv" DIV with a new one e.g.

<div id="newdiv>
  some text
</div>

so the string will look like this after the change:

<div>
  <img >
</div>

<? include(); ?>

<div id="mydiv">
  <div id="newdiv>
    some text
  </div>
</div>

I have already tried:

1) parsing the code using DOMdocument's loadHTML => it produces a lot of errors in case PHP code is included.

2) I played around a bit with regexes like preg_match_all('/<div id="myid"([^<]*)<\/div>/', $src, $matches), which fails in case more child divs are included.

The best approach I have found so far is:

1) find id="mydiv" string

2) search for '<' and '>' chars and count them like '<'=1 and '>'=-1 (not exactly, but it gives the idea)

3) once I get sum == 0 I should be on position of the closing tag, so I know, which portion string I should exchange

This is quite "heavy" solution, which can stop working in some cases, where the code is different (e.g. onpage PHP code contains the chars as well instead of just simple "include"). So I am looking so some better solution.

  • 写回答

2条回答 默认 最新

  • doujiang1913 2016-04-24 13:21
    关注

    You could try something like this:

    $file = 'filename.php';
    $content = file_get_contents($file);
    $array_one = explode( '<div id="mydiv">' , $content );
    $my_div_content = explode("</div>" , $array_one[1] )[0];
    

    Or use preg_match like you said:

    preg_match('/<div id="mydiv"(.*?)<\/div>/s', $content, $matches)
    
    评论

报告相同问题?

悬赏问题

  • ¥15 装 pytorch 的时候出了好多问题,遇到这种情况怎么处理?
  • ¥20 IOS游览器某宝手机网页版自动立即购买JavaScript脚本
  • ¥15 手机接入宽带网线,如何释放宽带全部速度
  • ¥30 关于#r语言#的问题:如何对R语言中mfgarch包中构建的garch-midas模型进行样本内长期波动率预测和样本外长期波动率预测
  • ¥15 ETLCloud 处理json多层级问题
  • ¥15 matlab中使用gurobi时报错
  • ¥15 这个主板怎么能扩出一两个sata口
  • ¥15 不是,这到底错哪儿了😭
  • ¥15 2020长安杯与连接网探
  • ¥15 关于#matlab#的问题:在模糊控制器中选出线路信息,在simulink中根据线路信息生成速度时间目标曲线(初速度为20m/s,15秒后减为0的速度时间图像)我想问线路信息是什么