从网页上刮取源代码<script>标记

I'm looking for a way to scrape some source code. The information I need is inside a tag similar to this.

<script>
.......
var playerIdMap = {};
playerIdMap['4'] = '614';
playerIdMap['5'] = '84';
playerIdMap['6'] = '65';
playerIdMap['7'] = '701';
getPlayerIdMap = function() { return playerIdMap; };   // global
}
enclosePlayerMap();
</script>

I am trying to grab the contents of the playerIdMap numbers eg: 4 and 614, or the whole line for that matter..

写回答
好问题 0 提建议
追加酬金
关注问题
分享
邀请回答
编辑收藏删除结题
收藏举报

1条回答默认最新

doutai1509 2017-09-11 16:25

关注

Edit-2

Complete PHP code inspired from code at How to get data from API - php - curl

<?php
/**
 * Handles making a cURL request
 *
 * @param string $url         URL to call out to for information.
 * @param bool   $callDetails Optional condition to allow for extended
 *   information return including error and getinfo details.
 *
 * @return array $returnGroup cURL response and optional details.
 */
function makeRequest($url, $callDetails = false)
{
  // Set handle
  $ch = curl_init($url);

  // Set options
  curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false);
  curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);

  // Execute curl handle add results to data return array.
  $result = curl_exec($ch);
  $returnGroup = ['curlResult' => $result,];

  // If details of curl execution are asked for add them to return group.
  if ($callDetails) {
    $returnGroup['info'] = curl_getinfo($ch);
    $returnGroup['errno'] = curl_errno($ch);
    $returnGroup['error'] = curl_error($ch);
  }

  // Close cURL and return response.
  curl_close($ch);
  return $returnGroup;
}

$url = "http://www.bullshooterlive.com/my-stats/999/";
$response = makeRequest($url, true);

$re = '/playerIdMap\[\'(?P<id>\d+)\']\s+=\s+\'(?P<value>\d+)\'/';

preg_match_all($re, $response['curlResult'], $matches, PREG_SET_ORDER, 0);

// Print the entire match result
var_dump($matches);

//var_dump($response);

Edit-1

Sorry didn't realize you asked PHP question. Don't know why I assumed scrapy here. Anyways below php code should help

$re = '/playerIdMap\[\'(?P<id>\d+)\']\s+=\s+\'(?P<value>\d+)\'/';
$str = '<script>
.......
var playerIdMap = {};
playerIdMap[\'4\'] = \'614\';
playerIdMap[\'5\'] = \'84\';
playerIdMap[\'6\'] = \'65\';
playerIdMap[\'7\'] = \'701\';
getPlayerIdMap = function() { return playerIdMap; };   // global
}
enclosePlayerMap();
</script>';

preg_match_all($re, $str, $matches, PREG_SET_ORDER, 0);

// Print the entire match result
var_dump($matches);

Previous answer

You can use something like below

>>> data = """
... <script>
... .......
... var playerIdMap = {};
... playerIdMap['4'] = '614';
... playerIdMap['5'] = '84';
... playerIdMap['6'] = '65';
... playerIdMap['7'] = '701';
... getPlayerIdMap = function() { return playerIdMap; };   // global
... }
... enclosePlayerMap();
... </script>
... """
>>> import re
>>>
>>> regex = r"playerIdMap\['(?P<id>\d+)']\s+=\s+'(?P<value>\d+)'"
>>> re.findall(regex, data)
[('4', '614'), ('5', '84'), ('6', '65'), ('7', '701')]

You need to get to the script tag using below

data = response.xpath("//script[contains(text(),'getPlayerIdMap')]").extract_first() 

import re
regex = r"playerIdMap\['(?P<id>\d+)']\s+=\s+'(?P<value>\d+)'"
print(re.findall(regex, data))
[('4', '614'), ('5', '84'), ('6', '65'), ('7', '701')]

本回答被题主选为最佳回答 , 对您是否有帮助呢?

报告相同问题？

关注问题

从网页上刮取源代码<script>标记 php
2017-09-11 14:35

回答 1 已采纳 Edit-2 Complete PHP code inspired from code at How to get data from API - php - curl <?php /*
vs C\C++ 无法打开源文件 <srting> c++ linux 其他
2021-04-07 16:05

回答 1 已采纳你打错了，string达成srting了
如何使用<pre>标签不出现在源代码中[关闭] php
2015-09-29 00:37

回答 1 已采纳 It is because that page is not HTML. If you take a look at the response, the Content-Type is text/
【phpcms-v9】怎样在<script src="xxx.php"></script>标记中引入php文件
2012-11-30 15:17

yanhui_wei的博客 1.文章详情页统计点击数量时：language="JavaScript"src=...> <?php defined('IN_PHPCMS') or exit('No permission resources.'); /** * 点击统计 */ $db = ''; $db = p
仅从PHP文件中获取源代码 php
2017-12-25 14:32

回答 2 已采纳 You have to use a non-greedy search by adding a ? after the [[:print:]]*. This is not available in
导入常见标头的html文件无法读取<head>和相关的javascript和css源 css html php
2017-12-24 16:32

回答 1 已采纳 This may be because of incorrect path. For example, inside top.html you are using relative path l
HTML<img>加载本地图片问题--菜鸟求教 html5
2017-10-19 02:47

回答 13 已采纳是不是你后面没有加上空格和/啊？貌似是要加上空格和斜杠的。或者改为试下看呢？建议楼主可以把文件目录截个图，看看html文件和图片存放的目录到底是咋样的
JAVA上百实例源码以及开源项目源代码
2016-09-17 21:58

　Java绘制图片火焰效果，源代码相关注释：前景和背景Image对象、Applet和绘制火焰的效果的Image对象、Applet和绘制火焰的效果的Graphics对象、火焰效果的线程、Applet的高度,图片到图片装载器、绘制火焰效果的X坐标...
网页显示正常，源代码是乱码 asp.net html5 python
2021-04-18 19:07

回答 1 已采纳这个网站单独给乱码部分写了特殊的字体来显示。。要还原有点难搞。。没研究过字体，一定要还原简单粗暴的方法做个文字对应表。。。显示的的话可以下载字体，设置你显示页面那块内容的字体为下载的ttf字
为什么网页源代码能返回但取不到特定值？ python
2021-10-09 14:33

回答 2 已采纳楼主，你的那个解析网页信息的代码有点错误哈！改一改就行了
PHP解析HTML源代码的时候标签与标签之前的空白字符怎么才能删除掉？？？？ html5 php
2019-03-23 13:39

回答 2 已采纳正则表达试去除空格，用replaCe
Pikachu （xss跨站脚本攻击）
2022-03-28 16:49

梅_花_七的博客 1.右击检查，页面源代码，可以看到form表单提交的数据显示在下面 2.用<script>alert(1)</script>弹窗测试出现弹窗显示数字1 3.也可以构造出查看cookie的js语句 messa.
有可能直接从PHP更改PHP源代码吗？ php
2015-11-16 12:03

回答 2 已采纳 You can do it, yes. Just open the .php file from inside the file using fopen(), modify what you ne
html中如何删除源代码,DOMDocument从HTML源代码中删除脚本标签 - php
2021-06-22 22:26

棠邑小廌的博客我使用@Alex's approach here使用内置的DOMDocument从HTML文档中删除脚本标签。问题是，如果我有一个包含Javascript内容的脚本标签，然后又有一个链接到外部Javascript源文件的脚本标签，则不是所有的脚本标签都已从...
寒假学习笔记6
2022-02-19 20:05

yxltx_的博客一个典型的非持久性XSS包含一个带XSS攻击向量的链接(即每次攻击需要用户的点击)恶意代码并没有保存在目标网站，由浏览器解析脚本。 LOW 查看源码代码直接采用get方式传入了name参数，并没有任何的过滤与检查，...
没有解决我的问题, 去提问

悬赏问题

¥100 set_link_state
¥15 虚幻5 UE美术毛发渲染
¥15 CVRP 图论物流运输优化
¥15 Tableau online 嵌入ppt失败
¥100 支付宝网页转账系统不识别账号
¥15 基于单片机的靶位控制系统
¥15 真我手机蓝牙传输进度消息被关闭了，怎么打开？(关键词-消息通知)
¥15 装 pytorch 的时候出了好多问题，遇到这种情况怎么处理？
¥20 IOS游览器某宝手机网页版自动立即购买JavaScript脚本
¥15 手机接入宽带网线，如何释放宽带全部速度

码龄粉丝数原力等级 --

从网页上刮取源代码<script>标记

1条回答默认最新

码龄粉丝数原力等级 --

悬赏问题

从网页上刮取源代码<script>标记

1条回答 默认 最新

悬赏问题

1条回答默认最新