Web抓取来自3gpp网站的html表的链接和日期

I'm trying to extract/scrap Zip Links and corresponding Date from the below Link's Release tab:

I am able to extract Zip links using the below php code:

preg_match_all('/<ul class=\"rpRootGroup\">(.*?)<\/ul/s',$specpage,$zipul);
$specul = new domDocument;
@$specul->loadHTML($zipul[0][0]);
$specul->preserveWhiteSpace = true;
$xpathspecul = new DOMXPath($specul);
$rowsUL = $xpathspecul->query('//tr');
$resultul = array();
$zipf = array();
$zipuni = array();

foreach ($rowsUL as $rowul) {
    $colsul = $rowul->getElementsByTagName('td');
    foreach ($colsul as $colul) {

        if($xpathspecul->evaluate('count(.//a)', $colul) > 0) { // check if an anchor exists
            $slinkul = $xpathspecul->evaluate('string(.//a/@href)', $colul); // if there is, then echo the href value
        }
        if (isset($slinkul) && $slinkul!=null){
            $resultul[] = $slinkul;
        }
    }
}

foreach ($resultul as $ziplink){
    $chkzip = pathinfo($ziplink, PATHINFO_EXTENSION);
    if ($chkzip == 'zip' && $ziplink!==null){
        $zipf[] = trim($ziplink);
    }
}
$zipuni = array_values (array_unique($zipf));

$specpage contains the website loaded using curl

Sample image of aforementioned Zip link and Date

However, I am not able to extract Corresponding Dates.

Further, i am having problem with using 'array_unique' as there can be same Zip link but with different corresponding date. However, without 'array_unique' im getting a lot of multiple links.

Any help is appreciated.

写回答
好问题 0 提建议
追加酬金
关注问题
分享
邀请回答
编辑收藏删除结题
收藏举报

1条回答默认最新

dongqindu8110 2017-02-12 17:52

关注

If your literally just trying to grab the date(00-00-0000) and zip url from the page given, you could just use this below. You could easily put this into one Regex but it's clearer to see how it's working using two. As the Regex queries are so specific, I was getting precisely 21 matches per query, so it was just a matter of creating an additional array with keys so the data can be sorted with ease.

$url = 'https://portal.3gpp.org/desktopmodules/Specifications/SpecificationDetails.aspx?specificationId=1387';
$data = file_get_contents($url);
preg_match_all('/http:\/\/.*\.zip/', $data, $links);
preg_match_all('/<\/td><td>\s*(\d*-\d*-\d*)\s*<\/td><td>/', $data, $dates);
$newArr = []; //Your new array with URL and Dates 

foreach($dates[0] as $k=>$v) {

    $newArr[] = ['date' => $v, 'url' => $links[0][$k]];
    echo 'Date: ' . $newArr[$k]['date'] . '<br>URL: ' .  $newArr[$k]['url'] . '<br><br>';
    //echo is for testing purposes. 
}

Output:

Date: 2015-12-18
URL: http://www.3gpp.org/ftp/Specs/archive/26_series/26.073/26073-d00.zip

Date: 2014-09-26
URL: http://www.3gpp.org/ftp/Specs/archive/26_series/26.073/26073-c00.zip

Date: 2012-09-21
URL: http://www.3gpp.org/ftp/Specs/archive/26_series/26.073/26073-b00.zip

Date: 2011-04-05
URL: http://www.3gpp.org/ftp/Specs/archive/26_series/26.073/26073-a00.zip

Date: 2009-12-18
URL: http://www.3gpp.org/ftp/Specs/archive/26_series/26.073/26073-900.zip

Date: 2008-12-18
URL: http://www.3gpp.org/ftp/Specs/archive/26_series/26.073/26073-800.zip

Date: 2007-06-21
URL: http://www.3gpp.org/ftp/Specs/archive/26_series/26.073/26073-700.zip

Date: 2005-01-06
URL: http://www.3gpp.org/ftp/Specs/archive/26_series/26.073/26073-600.zip

Date: 2004-04-01
URL: http://www.3gpp.org/ftp/Specs/archive/26_series/26.073/26073-530.zip

Date: 2003-10-02
URL: http://www.3gpp.org/ftp/Specs/archive/26_series/26.073/26073-520.zip

etc....

I've spot checked the data and the dates match up perfectly with the links.

本回答被题主选为最佳回答 , 对您是否有帮助呢?

报告相同问题？

关注问题

Web抓取来自3gpp网站的html表的链接和日期 html php
2017-02-12 16:50

回答 1 已采纳 If your literally just trying to grab the date(00-00-0000) and zip url from the page given, you co
3gpp协议我要怎么看射频工程
2023-02-21 11:30

回答 1 已采纳格式如下：具体可参考文章：https://blog.51cto.com/tonybai/3312598
有关学习3GPP 5g相关协议的问题网络网络协议
2022-07-14 14:19

回答 1 已采纳手里没个基站，不干几个项目，不好学
使用Node.js手撸一个建静态Web服务器，内部CV指南
2022-07-16 09:30

@魏大大的博客文章里有全部代码，也可以积分下载操作步骤如上图文章结束话说这个键盘真漂亮~~ 文章目录使用Node.js手撸一个建静态Web服务器一、动静态服务器的概念 1.1 静态Web服务器概念 1.2 静态Web服务器的优点 1.3 快速...
使用Android中的MultipartEntity上传3gpp音频文件 android java php
2013-06-20 06:52

回答 1 已采纳 Taken from my comment: android.os.NetworkOnMainThreadException says that you're posting on UI thr
请教音频格式3gp,3gpp,aac格式用网页实现的音频播放器的具体支持情况 html5 javascript
2012-07-30 14:34

回答 2 已采纳火狐不支持audio的loop chrome支持就ff不不支持 ff要用js绑定事件才能实现循环 audio.addEventListener('ended', function () {
Ubuntu无法进入系统 linux ubuntu
2022-01-25 20:24

回答 1 已采纳哈哈，估计是系统挂了，崩溃了啊，安全模式能进入吗
红队专题-Perm权限提升与维持隐匿Privilege Escalation
2023-10-27 09:50

amingMM的博客高权限不仅能对更多文件进行增删...用于检测和利用不同权限提升技术的手动方法，但会提到可以完成相同工作并节省您一些时间的自动化工具。完美，它指向我们的有效负载。使用net（也是默认安装的）启动此服务以获得
Android-4.4 实现录音功能报错 android
2022-05-01 10:55

回答 1 已采纳你申请录音权限了吗？报错原因看起来像是你没有申请麦克风权限
使用javascript或jQuery更新两个复选框选择的价格 html javascript jquery php
2014-12-28 22:53

回答 1 已采纳 DEMO html <div> <label class="product">Product</label> <inpu
使用JQuery和PHP进行Ajax上传 ajax jquery php
2013-07-09 12:19

回答 1 已采纳 ON MAC OSX: 1) Go to /etc/php.ini 2) Open the file with permission to write/read 3) Search for li
渗透测试面试题总结
2021-11-25 16:49

0x536d72的博客预加载型动态链接库后门 2.strace后门 3.SSH 后门 4.SUID后门 5.inetd服务后门 6.协议后门 7.vim后门 8.PAM后门 9.进程注入 10.Rootkit 11.端口复用 MSF权限维持 1.Persistence模块 2.Metsvc 模块 Powershell权限...
为什么我的php没有呈现？为什么我的php页面空白？ centos nginx php
2019-05-03 23:23

回答 1 已采纳 Following Richard Smith's question, nginx -T showed that I had another conf active that was interf
MATLAB各个产品概述----哪些产品需要安装？哪些产品不需要安装？阅完了然
2022-04-28 21:45

孙悟空的博客本文整理了MATLAB2022a各个产品，概述每个产品的功能，便于更好的了解和学习MATLAB。
Nginx和Apche优化
2017-10-10 17:04

free_xiaochen的博客简单的说，就是某些不法的网站，通过在其自身网站程序里未经许可非法调用其他网站的资源，然后在自己的网站上显示这些调用的资源，达到了填充自身网站显示的效果，但是浪费了调用资源网站的网络流量，造成其他网站...
⽹络安全学习计划
2021-07-15 11:28

小白狼࿐的博客网络安全的具体学习计划网络安全脚本小子学习计划第⼀阶段~信息安全基础1、信息安全基础2、nmap安装与使⽤3、burpsuite安装与使⽤⼆、Web基础三、Web安全⼊⻔四、信息收集五、漏洞原理第⼆阶段~基础知识学习⼀、程序...
红方实战手册
2021-03-15 16:10

lao_wine的博客网站备份文件, 敏感配置文件, 源码 , 别人的webshell, 等等等...] 目标站点自身在前端代码中泄露的各种敏感信息 fofa / shodan / bing / google hacking 深度利用搜集目标学生学号 / 员工工号 / 目标邮箱 [ 并...
红队人员实战手册
2020-11-30 22:46

Azjj98的博客批量抓取目标所有真实C段 Web banner 批量对目标所有真实C段进行基础服务端口扫描探测识别尝试目标DNS是否允许区域传送,如果不允许则继续尝试子域爆破批量抓取目标所有子域 Web banner 批量对目标所有子域集中...
session实现用户登陆功能
2019-07-03 07:08

weixin_44129498的博客 http-equiv="keywords" content="keyword1,keyword2,keyword3">...
没有解决我的问题, 去提问

悬赏问题

¥15 孟德尔随机化怎样画共定位分析图
¥18 模拟电路问题解答有偿
¥15 CST仿真别人的模型结果仿真结果S参数完全不对
¥15 误删注册表文件致win10无法开启
¥15 请问在阿里云服务器中怎么利用数据库制作网站
¥60 ESP32怎么烧录自启动程序
¥50 html2canvas超出滚动条不显示
¥15 java业务性能问题求解(sql，业务设计相关)
¥15 52810 尾椎c三个a 写蓝牙地址
¥15 elmos524.33 eeprom的读写问题

码龄粉丝数原力等级 --

Web抓取来自3gpp网站的html表的链接和日期

1条回答默认最新

码龄粉丝数原力等级 --

悬赏问题

Web抓取来自3gpp网站的html表的链接和日期

1条回答 默认 最新

悬赏问题

1条回答默认最新