使用正则表达式和php从html中提取javascript对象

I am trying to extract a specific JavaScript object from a page containing the usual HTML markup.

I have tried to use regex but i don't seem to be able to get it to parse the HTML correctly when the HTML contains a line break.

An example can be seen here: https://regex101.com/r/b8zN8u/2

The HTML i am trying to extract looks like this:

<script>
  DATA.tracking.user = { 
      age: "19", 
      name: "John doe" 
  }
</script>

Using the following regex: DATA.tracking.user=(.*?)}

<?php
$re = '/DATA.tracking.user = (.*?)\}/m';
$str = '<script>
           DATA.tracking.user = { age: "19", name: "John doe" }
        </script>';

preg_match_all($re, $str, $matches, PREG_SET_ORDER, 0);

If i parse DATA.tracking.user = { age: "19", name: "John doe" } without any linebreaks, Then it works fine but if i try to parse:

DATA.tracking.user = { 
      age: "19", 
      name: "John doe" 
  }

It does not like dealing with the line breaks.

Any help would be greatly appreciated.

Thanks.

写回答
好问题 0 提建议
追加酬金
关注问题
分享
邀请回答
编辑收藏删除结题
收藏举报

4条回答默认最新

关注

码龄粉丝数原力等级 --

被采纳

被点赞

采纳率
douliang1369 2018-05-08 07:48
关注
The simple solution to your problem is to use the s pattern modifier to command the . (any character) to also match newline characters -- which it does not by default.

And you should:

escape your literal dots.

write the \{ outside of your capture group.

omit the m pattern modifier because you aren't using anchors.

...BUT...

If this was my task and I was going to be processing the data from the extracted string, I would probably start breaking up the components at extraction-time with the power of \G.

Code: (Demo) (Pattern Demo)

$htmls[] = <<<HTML DATA.tracking.user = { age: "19", name: "John doe", int: 55 } // This works HTML; $htmls[] = <<<HTML DATA.tracking.user = { age: "20", name: "Jane Doe", int: 49 } // This does not works HTML; foreach ($htmls as $html) { var_export(preg_match_all('~(?:\G(?!^),|DATA\.tracking\.user = \{)\s+([^:]+): (\d+|"[^"]*")~', $html, $out, PREG_SET_ORDER) ? $out : []); echo " --- "; }

Output:

array ( 0 => array ( 0 => 'DATA.tracking.user = { age: "19"', 1 => 'age', 2 => '"19"', ), 1 => array ( 0 => ', name: "John doe"', 1 => 'name', 2 => '"John doe"', ), 2 => array ( 0 => ', int: 55', 1 => 'int', 2 => '55', ), ) --- array ( 0 => array ( 0 => 'DATA.tracking.user = { age: "20"', 1 => 'age', 2 => '"20"', ), 1 => array ( 0 => ', name: "Jane Doe"', 1 => 'name', 2 => '"Jane Doe"', ), 2 => array ( 0 => ', int: 49', 1 => 'int', 2 => '49', ), ) ---

Now you can simply iterate the matches and work with [1] (the keys) and [2] (the values). This is a basic solution, that can be further tailored to suit your project data. Admittedly, this doesn't account for values that contain an escaped double-quote. Adding this feature would be no trouble. Accounting for more complex value types may be more of a challenge.
本回答被题主选为最佳回答 , 对您是否有帮助呢?

解决无用
评论打赏
分享
举报

评论

按下Enter换行，Ctrl+Enter发表内容

查看更多回答(3条)

报告相同问题？

关注问题

使用正则表达式和php从html中提取javascript对象 php
2018-05-07 23:24

回答 4 已采纳 The simple solution to your problem is to use the s pattern modifier to command the . (any charact
想使用正则表达式匹配，提取文本中特定的内容。 python 正则表达式
2022-01-19 16:23

回答 2 已采纳这应该就是你想要的功能： import os, re def GetMiddleStr(content,startStr,endStr): '''提取字符串content当中，startStr
如何在正则表达式中使用变量？ javascript 前端正则表达式
2022-01-09 11:44

回答 1 已采纳 /regex\d/g您可以构造一个新的RegExp对象，而不使用语法：var replace = "regex\d";var re = new RegExp(replace,"g"); 您可以通过这种
php 正则表达式 提取,使用正则表达式从PHP中的字符串中提取文本
2021-03-24 09:37

作者小怪兽的博客我正在解析php中的javascript文件,并想从一行中提取2个函数参数.以下是线条外观的示例："Backbone.Radio( 'comments ').trigger("added:comment " ,function(){});"我想从这一行中提取注释一词并添加：comment,但不...
使用正则表达式提取文本数据，正则表达式如何写 python 有问必答正则表达式爬虫
2021-10-25 18:26

回答 2 已采纳 regex = r"('gender':\s*{[^}]+})|('glasses':\s*{[^}]+})|('emotion':.+.jpg')" 不清楚是否你每个文件都是类似的，如果不行，再
在PHP中使用正则表达式进行用户名验证 php
2017-07-08 07:51

回答 3 已采纳 The following pattern will work: ^[a-z0-9][a-z0-9_]*[a-z0-9]$ ^[a-z0-9]: first character may not
求一个php正则表达式 php 正则表达式
2022-01-23 19:47

回答 1 已采纳试试这个import repattern = re.compile (r'(?:money=)\d+.?\d*')pattern.findall(string)
html 文本提取正则,正则表达式从HTML中提取文本
2021-06-10 12:23

一代目的博客 12 个答案:答案 0 :(得分：15)删除javascript和CSS：删除标签答案 1 :(得分：11)您无法使用正则表达式真正解析HTML。这太复杂了。 RE根本不会正确处理)可以在浏览器中作为正确的文本使用，但可能会让一个天真的RE...
使用正则表达式从html文件中提取图片url怎么写？ html5
2018-07-15 02:28

回答 6 已采纳 ``` 请教个正则表达式的问题，我想从html文件中提取图片url，比如，如果只提取png图片正则表达式怎么写 ```
正则表达式如何写，在一段字符串中提取指定的内容。 python 正则表达式
2022-05-03 20:38

回答 8 已采纳 import re text = """福建省2022年道路交通事故人身损害赔偿相关数据【福建一般地区（除厦门外）】 1、全省城镇居民人均年可支配收入 51140元2、全省农村居民人均年可支配收
请教一个PHP正则表达式的问题 php 有问必答正则表达式
2021-08-24 09:13

回答 2 已采纳这样？有帮助麻烦点个采纳【本回答右上角】，谢谢~~ <?php $s=<<<str 1.\$foo->\$bar['baz'] 主要想用两个正则表达式，放入编辑器以查询
Python 正则表达式详解（建议收藏！）
2021-10-01 20:05

~Echo的博客 python中re模块提供了正则表达式的功能，常用的有四个方法(match、search、findall)都可以用于匹配字符串match匹配字符串match方法尝试从字符串的起始位置匹配一个模式，如果不是起始位置匹配成功的话，match()就...
如何使用正则表达式提取特定字符串后面的数字正则表达式
2018-10-26 07:16

回答 9 已采纳你用的什么语言，比如java 你的代码匹配之后，group(0)是 pages:13 group(1)是13
php正则表达式判断形如,PHP正则表达式教程(转载)
2021-04-29 08:31

weixin_39637457的博客 1、入门简介简单的说，正则表达式是一种可以用于模式匹配和替换的强有力的工具，主要用于字符串的模式分割、匹配、查找及替换操作。我们可以在几乎所有的基于UNIX系统的工具中找到正则表达式的身影，例如，vi编辑器...
php正则表达式详解,PHP正则表达式详解
2021-03-23 18:15

油葫芦阅金经的博客此外，JavaScript这种客户端的脚本语言也提供了对正则表达式的支持，现在正则表达式已经成为了一个通用的概念和工具，被各类技术人员所广泛使用。在某个Linux网站上面有这样的话：”如果你问一下...
没有解决我的问题, 去提问

悬赏问题

¥15 运筹学排序问题中的在线排序
¥15 关于docker部署flink集成hadoop的yarn，请教个问题 flink启动yarn-session.sh连不上hadoop，这个整了好几天一直不行，求帮忙看一下怎么解决
¥30 求一段fortran代码用IVF编译运行的结果
¥15 深度学习根据CNN网络模型，搭建BP模型并训练MNIST数据集
¥15 lammps拉伸应力应变曲线分析
¥15 C++ 头文件/宏冲突问题解决
¥15 用comsol模拟大气湍流通过底部加热（温度不同）的腔体
¥50 安卓adb backup备份子用户应用数据失败
¥20 有人能用聚类分析帮我分析一下文本内容嘛
¥15 请问Lammps做复合材料拉伸模拟，应力应变曲线问题

使用正则表达式和php从html中提取javascript对象

4条回答 默认 最新

...BUT...

悬赏问题

4条回答默认最新