too long

I'm currently working on a way to parse a HTML-document into a database. I'm not allowed to change any formatting from the HTML document. In the following example i need to find which tags have class id "Category", and then grab the data within this tag, in this example "Example Text".

How do I get the code to not only match tags which are directly ended afterwards?

$tags = "<p class=Category style='margin-left:0in;text-indent:0in'><a name='_
Toc390163149'></a><a name='_Ref388370252'></a><a
name='_Toc122858606'><span lang=EN-GB>3.<span style='font:7.0pt 'Times New 
Roman''>&nbsp;</span></span><span lang=EN-GB>Example Text</span></a></p>";

preg_match_all("/(<([\w]+)[^>]*>)(.*?)(<\/\\2>)/", $tags, $matches, PREG_SET_
        foreach ($matches as $val) {
            echo "matched: " . htmlspecialchars($val[0]) . "</br>";
            echo "part 1: " . htmlspecialchars($val[1]) . "</br>";
            echo "part 2: " . htmlspecialchars($val[2]) . "</br>";
            echo "part 3: " . htmlspecialchars($val[3]) . "</br>";
            echo "part 4: " . htmlspecialchars($val[4]) . "</br></br>";
        }

Outputs:

matched: <a name="_Toc390163149"></a>
part 1: <a name="_Toc390163149">
part 2: a
part 3:
part 4: </a>

matched: <a name="_Ref388370252"></a>
part 1: <a name="_Ref388370252">
part 2: a
part 3:
part 4: </

matched: <span lang=EN-GB>When not to follow Rules</span>
part 1: <span lang=EN-GB>
part 2: span
part 3: When not to follow Rules
part 4: </span>

Any ideas?

写回答
好问题 0 提建议
追加酬金
关注问题
分享
邀请回答
编辑收藏删除结题
收藏举报

1条回答默认最新

关注

码龄粉丝数原力等级 --

被采纳

被点赞

采纳率
dongwenhui8900 2014-07-22 09:34
关注
Short answer, you can't parse complicated data formats such as HTML with regex, or at least you shouldn't.

Long answer, PHP provides a number of libraries for parsing HTML that would be both far less effort and far less prone to errors than the regex solution would be. The two of interest are going to be SimpleXML (if you're parsing XHTML) and DOMDocument (if you're parsing markup that may or may not be XML). I'd be inclined to use the latter for HTML.

Once you've loaded the markup into a DOMDocument, you can use an XPath query to locate all the p.category tags and iterate over them to get their child nodes and content.

本回答被题主选为最佳回答 , 对您是否有帮助呢?

解决无用
评论打赏
分享
举报

评论

按下Enter换行，Ctrl+Enter发表内容

报告相同问题？

关注问题

too long laravel php
2018-01-11 11:47

回答 1 已采纳 Please try this as i think you have wrong syntax for form {!! Form::open(array('route' => ['lo
php requesturi too long,HTTP 414 “Request URI too long” 表单提交内容太多
2021-04-29 07:09

龙憩的博客 Detect languageAfrikaansAlbanianArabicArmenianAzerbaijaniBasqueBelarusianBengaliBosnianBulgarianCatalanCebuanoChichewaChinese (Simplified)Chinese (Traditional)CroatianCzechDanishDutchEnglishEsperantoE...
Request-URI Too Long
2023-04-05 22:53

tan90degrees的博客 414 URI Too Long的解决及，GET，POST，URI长度规则
414 Request-URI Too Long 15ms
2020-12-03 11:59

fivestar2009的博客这个问题是使用get请求后面跟的参数太多，造成的，解决办法是把get请求换成POST请求 @POST @Path("/poststate") ... @Produces(MediaType.APPLICATION_JSON) publicList<...getPostUserState(@FormParam("type")...
python line too long_解决Intellij IDEA运行报Command line is too long的问题
2020-12-09 10:33

weixin_39619481的博客解决Intellij IDEA运行报Command line is too long的问题报错信息大概如下：Error running 'xxx':Command line is too long. Shorten command line for xxx or also for Application default configuration.解决方案...
mysql specified key was too long_MySQL错误“Specified key was too long; max key length is 1000 bytes”的解...
2021-02-07 12:40

SO豹猫的博客 yii2-redis 扩展详解安装yii2-redis composer require yiisoft/yii2-redis 修改config/web.php 的 components 配置 'cache' => [ / ... python-----编写接口，使用postman与soapiu与jemeter访问调用实例:自己写...
数据库报错提示： Data too long for column
2023-02-09 10:15

Adorable_Rocy的博客 Data too long for column
php 读取 oracle clob,在php+oracle中clob字段插入大于4000字节报string literal too long
2021-04-12 22:47

幼生期的博客挖坟。。隔了这么久，偶然找到了...使用 php_oci8扩展。在插入时：$sql = "INSERT INTO mylobs(id,mylob)VALUES(mylobs_id_seq.NEXTVAL,EMPTY_CLOB())RETURNINGmylob INTO :mylob_loc";$stmt = oci_parse($conn, $s...
ORA-01704: string literal too long
2021-07-08 00:48

spencer_tseng的博客 oracle 执行sql 插入或者修改错误，由于sql长度受限的原因
laravel迁移出错Syntax error or access violation: 1071 Specified key was too long
2023-02-06 15:52

很菜很菜的人的博客执行php artisan migrate时报错了。修改config/database.php文件。找到mysql配置项。
python argument list too long_Linux下删除大量文件的方法，Argument list too long报错解决办法...
2020-12-20 09:43

weixin_39614094的博客 Linux下删除大量文件的方法，Argument list too long报错解决办法Linux用rm -rf * 删除php ci的sess_save_path文件夹下的文件时报错-bash: /bin/rm: Argument list too long删除文件中包含的小文件数量过多，通常是...
Laravel 1071 Specified key was too long； max key length is 1000 bytes
2021-11-05 22:09

纯_粹的博客 1071 Specified key was too long; max key length is 1000 bytes 原因：默认使用utf8mb4字符编码，而不是之前的utf8编码解决方案： 1、升级MySql 版本到5.5.3以上 2、手动配置迁移命令 migrate 生成的默字符...
Data too long for column ‘xxx‘ at row 1
2022-08-06 12:01

想要飞翔的企鹅的博客 mysql更改字段问题
php request-uri too large,后端：414 Request-URI Too Large解决方案
2021-03-26 12:05

北归啦的博客 Web项目接口请求会出现414 Request-URI Too Large这个错误下面给大家分享一下相关解决办法：一、get请求改为Post请求如果你的web请求是get请求，可以考虑调整为post请求。get请求：当发送数据时，GET 方法向 URL ...
Syntax error or access violation: 1071 Specified key was too long； max key length is 1000 bytes
2022-05-11 11:42

One_forMe的博客修改文件App/Provide/AppServiceProvider.php use Illuminate\Support\Facades\Schema; public function boot() { Schema::defaultStringLength(191); }
php too many request,php failed to open stream: Too many open files in
2021-03-22 10:48

数据分析师的博客 Warning: require_once(/mnt/hgfs/www/open/test/Requests-master/library/Requests/Exception.php): failed to open stream: Too many open files in /mnt/hgfs/...
Argument list too long” 错误与解决方法
2020-01-03 09:24

Rio520的博客当在 linux 系统中试图传递太多参数给一个命令时，就会出现 “argument list too long” 错误，这是 linux 系统一直以来都有的限制，查看这个限制可以通过命令 “getconf ARG_MAX” 来实现， getconf ARG_MAX ...
【MySQL】【报错码】1406-Data too long for column
2022-12-06 10:41

lao-jiawei的博客【报错码】1406-Data too long for column处理方法
php request-uri too large,【mark】报414错误“Request-URI Too Large”的解决办法
2021-03-26 12:05

ByteWizardry的博客今天在后台编辑资料的时候发现报414错误好好看了一下，发现这个get的url超长get_ajax_content.php?is_ajax=1&act=getFCKeditor&content=%3Ch3%20style%3D%22text-align%3A%20left%3B%22%3E%E3%80%90%E7%89%...
Specified key was too long； max key length is 767 bytes
2022-07-05 14:16

偷渡的非酋的博客 mysql导入数据时报Specified key was too long； max key length is 767 bytes错。原数据库新数据库配置因此修改参数即可 set global innodb_large_prefix = ON;
没有解决我的问题, 去提问

悬赏问题

¥15 多电路系统共用电源的串扰问题
¥15 shape_predictor_68_face_landmarks.dat
¥15 slam rangenet++配置
¥15 有没有研究水声通信方面的帮我改俩matlab代码
¥15 对于相关问题的求解与代码
¥15 ubuntu子系统密码忘记
¥15 信号傅里叶变换在matlab上遇到的小问题请求帮助
¥15 保护模式-系统加载-段寄存器
¥15 电脑桌面设定一个区域禁止鼠标操作
¥15 求NPF226060磁芯的详细资料

too long

1条回答 默认 最新

悬赏问题

1条回答默认最新