PHP Regex在HTML标记之间检索文本，但不检索标记

Similar question might be asked many times but I have a bit complex one.
I know when we want to parse only the text between <title> tag in this scenario,

<title>My work</title>
<p>This is my work.</p> <p>Learning regex.</p>

we can form a Regex like this:

>([^<]*)<

Source

But that works only because the <title> tag is on the top. But if the tag is the second one, it won't work.
Okay, my scenario is,

<td class="td1" headers="searchth1">JAVA1</td>
<td class="td2" headers="searchth2">JAVA2</td>
<td class="td3" headers="searchth3">JAVA3</td>

<td class="td1" headers="searchth1">PHP1</td>
<td class="td2" headers="searchth2">PHP2</td>
<td class="td3" headers="searchth3">PHP3</td>

There are many similar tags in the file, and I want to retrieve only the text between <td class="td1" headers="searchth1"> and </td> tags.
And, I've used '#<td class="td1" headers="searchth1">(.*)</td>#' , which is working fine. But it is also including all other <td> tags in the output, which I don't want.
I want only the texts Java1 and PHP1 and I guess if I could able to retrieve the text between the tags by excluding the tags, I may acieve it.
Am I correct? or Wrong? If so, how to achieve what I want?
Thanks in advance!!

写回答
好问题 0 提建议
追加酬金
关注问题
分享
邀请回答
编辑收藏删除结题
收藏举报

2条回答默认最新

关注

码龄粉丝数原力等级 --

被采纳

被点赞

采纳率
douh9817 2014-12-09 23:07
关注
I think your regex approach, while technically possible, is going to cause more trouble down the line. For example, if the source HTML changed so the headers attribute appeared before the class attribute the regex would fail. Also, your code will become pretty unreadable very quickly if you're using regex to search through HTML source code.

To parse HTML you should use PHP's DOMDocument functions, which are more robust in the face of changing HTML code and are far more readable to whoever may be maintaining your code (including you). This method will also support looking at other element attributes more easily. The sample code below should work for your use case:

$doc = '<td class="td1" headers="searchth1">JAVA1</td> <td class="td2" headers="searchth2">JAVA2</td> <td class="td3" headers="searchth3">JAVA3</td> <td class="td1" headers="searchth1">PHP1</td> <td class="td2" headers="searchth2">PHP2</td> <td class="td3" headers="searchth3">PHP3</td>'; $dom = new DOMDocument(); $dom->loadHTML($doc); $xpath = new DOMXpath($dom); $tds = $xpath->query("//td[@class='td1']"); // the query could also be "//td[@headers='searchth1']" or even // "//td[@headers='searchth1'][@class='td1']" depending on what you want to target foreach($tds as $td){ var_dump($td->nodeValue); }

If you want to learn more about building and using xpath queries, I suggest the article PHP DOM: Using XPath over at SitePoint.com.
本回答被题主选为最佳回答 , 对您是否有帮助呢?

解决无用
评论打赏
分享
举报

评论

按下Enter换行，Ctrl+Enter发表内容

查看更多回答(1条)

报告相同问题？

关注问题

PHP Regex在HTML标记之间检索文本，但不检索标记 php
2014-12-09 22:50

回答 2 已采纳 I think your regex approach, while technically possible, is going to cause more trouble down the l
使用php regex从html标记元素中删除属性 html php
2013-09-19 14:11

回答 3 已采纳 $str = ' <p class="class_style" style="font-size: medium; line-height: normal; letter-spacing:
正则表达式在html标签之间获取文本 - PHP [重复] html php
2018-01-07 04:43

回答 1 已采纳 try this <?php function teste(){ $string = '<div>Hello, i am João</div><a
您如何在PHP中解析和处理HTML / XML？
2019-12-04 10:40

asdfgh0077的博客如何解析HTML / XML并从中提取信息？
preg_match - 结束标记和开始标记之间的文本 html php
2018-08-11 11:17

回答 3 已采纳 Looks like something wrong with your php installation/configuration. Your code as it's. $content
PHP Regex从url标记获取名称 php
2016-08-04 21:34

回答 1 已采纳 You can use this one '/\[URL[^]]+\](?P<name>[^\[]+)\[\/URL\]/' But you should probably le
PHP Regex在BBCode标签之间获取文本 php
2012-03-09 20:39

回答 1 已采纳 I think you're searching for something like this <?php preg_match_all("/\[code\](.*?)\[\/code\
PHP_正则表达式
2021-06-25 11:03

小白学安全的博客正则表达式通常被用来检索、替换符合规则的文本特点灵活性、逻辑性和功能性强可以快速地用极简单的方式达到支付出的复杂控制用途判断字符串是否符合某规则从一个字符串中找出符合规则的所以子字符串 ...
PHP Regex用于h4标记之间的信息 php
2011-08-02 01:57

回答 2 已采纳 If you can trust that grabbing all characters up to the first < is a good enough rule then use
在PHP中删除所有空的HTML标记对 html php
2014-09-09 15:06

回答 2 已采纳 Need an unicode regex as the sample "empty" tags are actually not empty: $re = '~<(\w+)[^>]
如何在php中将文本preg_replace到强标记 html php
2014-12-04 22:55

回答 4 已采纳 You need to escape the front-slash in your pattern. preg_replace('/<strong>(.*)<\/stron
< php 没有结尾,没有php结束符
2021-03-20 20:56

F1BigData的博客 php 模式修正符模式修正符 — 解说正则表达式模式中使用的修正符说明下面列出了当前在 PCRE 中可能使用的修正符。括号中是这些修正符的内部 PCRE 名。修正符中的空格和换行被忽略，其它字符会导致错误。i (PCRE_...
仅输出PHP数组括号中不包含HTML标记的值 php
2017-02-14 03:16

回答 2 已采纳 You could use the following expression to match strings with HTML tags inside of parentheses: /\(
php正则字符串,php正则匹配字符串
2021-04-08 09:04

郭五月的博客浅谈正则表达式原文:浅谈正则表达式一、什么是正则表达式？简单的说：正则表达式(Regular ...二、正则表达式的应用正则表达式在实际的...文章杰克.陈2014-08-13859浏览量PHP学习笔记之POSIX正则表达式1　基础知识...
php 字符去空格无效,无法使用str_replace()去除PHP字符串中的空格
2021-04-21 14:20

propsX的博客嗨，我得到一个PHP字符串，我需要删除空格。我使用了以下代码但是当我回显$classname时，它只显示仍然包含空格的字符串。$fieldname = the_sub_field('venue_title');$classname = str_replace(' ', '', $fieldname...
没有解决我的问题, 去提问

悬赏问题

¥15 python中合并修改日期相同的CSV文件并按照修改日期的名字命名文件
¥15 有赏，i卡绘世画不出
¥15 如何用stata画出文献中常见的安慰剂检验图
¥15 c语言链表结构体数据插入
¥40 使用MATLAB解答线性代数问题
¥15 COCOS的问题COCOS的问题
¥15 FPGA-SRIO初始化失败
¥15 MapReduce实现倒排索引失败
¥15 ZABBIX6.0L连接数据库报错，如何解决？(操作系统-centos)
¥15 找一位技术过硬的游戏pj程序员

PHP Regex在HTML标记之间检索文本，但不检索标记

2条回答 默认 最新

悬赏问题

2条回答默认最新