douge3113 2014-05-19 08:09
浏览 32
已采纳

正则表达式用于修剪HTML标记中包含的字符串的空格

I've this HTML string (validated):

<div><img src="images/stories/2014/AAA.gif" alt="AAA" width="24" height="24" /> THE PRODUCTION OF: PLASTIC BOTTLES   <br /></div>

I've to extract the only title near <img> tag trimming all spaces before and after, than wrap it in a <h1> tag. The expeded result should be:

<div><h1>THE PRODUCTION OF: PLASTIC BOTTLES</h1></div>

I've done a regular expression that works but that also include the spaces in the final result:

/<img\s*src="[^"]+"\s*alt="AAA"\s*width="24"\s*height="24"\s*\/>\s*([^<]+)\s*<br\s*\/>/

The image is recognizable for these characteristics values of alt, width and height attributes. Thanks.

  • 写回答

3条回答 默认 最新

  • doz97171 2014-05-19 08:17
    关注

    Making your match non greedy should do the trick: <img\s*src="[^"]+"\s*alt="AAA"\s*width="24"\s*height="24"\s*\/>\s*([^<]+?)\s*<br\s*\/> (notice the extra ? next to [^<]+). More information available here.

    That being said, you should really be using something like the PHP DOM Parser to process HTML.

    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论
查看更多回答(2条)

报告相同问题?

悬赏问题

  • ¥100 需要跳转番茄畅听app的adb命令
  • ¥50 寻找一位有逆向游戏盾sdk 应用程序经验的技术
  • ¥15 请问有用MZmine处理 “Waters SYNAPT G2-Si QTOF质谱仪在MSE模式下采集的非靶向数据” 的分析教程吗
  • ¥50 opencv4nodejs 如何安装
  • ¥15 adb push异常 adb: error: 1409-byte write failed: Invalid argument
  • ¥15 nginx反向代理获取ip,java获取真实ip
  • ¥15 eda:门禁系统设计
  • ¥50 如何使用js去调用vscode-js-debugger的方法去调试网页
  • ¥15 376.1电表主站通信协议下发指令全被否认问题
  • ¥15 物体双站RCS和其组成阵列后的双站RCS关系验证