dtwxt88240 2017-02-23 20:38
浏览 58

使用分隔符PHP正则表达式匹配块内的值

everyone i have a Regex question here, i want to parse this Log file, right now i want to get the keys and values inside of SESSION

The problem is that the logs don't all look the same, some of them lack the # characters enclosing the 'SESSION', they all contain the word SESSION to start off the block of variables however, and they all end with another block which always contains either the words "POST" or "API CURL CALL".

So i have to use quantifiers most likely to make it disregard anything in between those strings but when match any sets of keys and values (separated by :) inside of these two other values...

That's a mouthful just talking about it... i'm completely stumped, so i turn to you guys for some guidance and help in this matter. The goal is to parse these shitty logs into something i can actually read quickly and understand.

I'm creating a class in PHP to do that and spit out some nice HTML formatted logs. This is the log file as it stands.

[05:40:40] ################
[05:40:40] #### SOURCE ####: /zalo/vn/interface.call.php
[05:40:40] #### REQUEST ####: /zalo/vn/interface.call.php
[05:40:40] #### Refer: http://app.com/zalo/vn/?v=1&adsid=d6e5f33e5a94d9fafaf15dc0cf4a1e5&sub_id=170100sf01435487523&sub_id1=232s5
[05:40:40] #### SESSION #####
[05:40:40] v: 1 
[05:40:40] adsid: d6e5f33e5a94d93sfsf5dc0cf4a1e5 
[05:40:40] sub_id: 799e12b08fa1edes1d7bgsg0506a6e9 
[05:40:40] landingpage: http%3A%2F%2Fapp.com%2Fzalo%2Fvn%2Finterface.call.php 
[05:40:40] c_id: da21bae82c02d1e2b8168d57cd3fbab7 
[05:40:40] nId: 3943 
[05:40:40] partner: Marvel
[05:40:40] country_code: 84 
[05:40:40] country: VN 
[05:40:40] url: http://app.com/zalo/vn/ 
[05:40:40] campaign_id: 1066 
[05:40:40] source: web 
[05:40:40] msisdn: 906346534 
[05:40:40] Phone: 906346534 
[05:40:40] #### POST ####
[05:40:40] action: subscribe 
[05:40:40] Phone: 906346534 
[05:40:40] ################
[05:40:40] #### API CURL CALL ####

Ideally what i'd want to keep is this section

v: 1 
adsid: d6e5f33e5a94d93sfsf5dc0cf4a1e5 
sub_id: 799e12b08fa1edes1d7bgsg0506a6e9 
landingpage: http%3A%2F%2Fapp.com%2Fzalo%2Fvn%2Finterface.call.php 
c_id: da21bae82c02d1e2b8168d57cd3fbab7 
nId: 3943 
partner: Marvel
country_code: 84 
country: VN 
url: http://app.com/zalo/vn/ 
campaign_id: 1066 
source: web 
msisdn: 906346534 
Phone: 906346534 

I probably need a lookbehind-lookahead combination of some sort.

(?=SESSION).*?(?<=POST)

Something along these lines but that also removes the timestamps the actual SESSION and POST keywords that i don't require.

  • 写回答

1条回答 默认 最新

  • dtr84664 2017-02-23 22:12
    关注

    If the file's not too big you could just loop through the whole thing:

    $foo = file("test.txt");
    $insession = false;
    foreach ($foo as $line) {
        if (!$insession) {
            if (strpos($line, "SESSION") === false) continue;
            $insession = true;
            continue;
        }
        if (strpos($line, "POST") !== false) break;
        if (preg_match("/^\[[\d:]+?\] (.*)$/", $line, $matches)) {
            echo "$matches[1]
    ";
        }
    }
    
    评论

报告相同问题?

悬赏问题

  • ¥15 Mac系统vs code使用phpstudy如何配置debug来调试php
  • ¥15 目前主流的音乐软件,像网易云音乐,QQ音乐他们的前端和后台部分是用的什么技术实现的?求解!
  • ¥60 pb数据库修改与连接
  • ¥15 spss统计中二分类变量和有序变量的相关性分析可以用kendall相关分析吗?
  • ¥15 拟通过pc下指令到安卓系统,如果追求响应速度,尽可能无延迟,是不是用安卓模拟器会优于实体的安卓手机?如果是,可以快多少毫秒?
  • ¥20 神经网络Sequential name=sequential, built=False
  • ¥16 Qphython 用xlrd读取excel报错
  • ¥15 单片机学习顺序问题!!
  • ¥15 ikuai客户端多拨vpn,重启总是有个别重拨不上
  • ¥20 关于#anlogic#sdram#的问题,如何解决?(关键词-performance)