drl37530 2015-05-07 08:58
浏览 209
已采纳

解析代理fcgi错误日志文件中的所有段

I try to parse this string:

[Wed May 06 15:09:08.160122 2015] [proxy_fcgi:error] [pid 30987:tid 140285789038336] [client 192.168.56.1:39157] AH01071: Got error 'PHP message: PHP Fatal error:  Undefined class constant 'self::TF_TEASER_LONG' in /var/www/foo/admin/server/php/UploadHandler.php on line 588
PHP message: PHP Stack trace:
PHP message: PHP   1. {main}() /var/www/foo/admin/server/php/index.php:0
PHP message: PHP   2. UploadHandler->__construct() /var/www/foo/admin/server/php/index.php:14
PHP message: PHP   3. UploadHandler->initialize() /var/www/foo/admin/server/php/UploadHandler.php:172
PHP message: PHP   4. UploadHandler->post() /var/www/foo/admin/server/php/UploadHandler.php:187
PHP message: PHP   5. UploadHandler->handle_file_upload() /var/www/foo/admin/server/php/UploadHandler.php:767
', referer: http://foo.com/admin/module.php?id=29

What I expect as matches at the end would be:

1  -> Wed
2  -> May
3  -> 06
4  -> 15
5  -> 09
6  -> 08
7  -> 2015
8  -> proxy_fcgi:error
9  -> 192.168.56.1:39157
10 -> PHP Fatal error
11 -> Undefined class constant 'self::TF_TEASER_LONG'
12 -> /var/www/foo/admin/server/php/UploadHandler.php
13 -> 588
14 -> PHP message: PHP   1. {main}() /var/www/foo/admin/server/php/index.php:0
PHP message: PHP   2. UploadHandler->__construct() /var/www/foo/admin/server/php/index.php:14
PHP message: PHP   3. UploadHandler->initialize() /var/www/foo/admin/server/php/UploadHandler.php:172
PHP message: PHP   4. UploadHandler->post() /var/www/foo/admin/server/php/UploadHandler.php:187
PHP message: PHP   5. UploadHandler->handle_file_upload() /var/www/foo/admin/server/php/UploadHandler.php:767

15 -> http://foo.com/admin/module.php?id=29

I'm currently at this regex and already fail understanding basic principles:

/(\[(.*?)\])?((?<=\')(.*)(?=\'))?(, referer: (.*))*/g
  1. Why do I have to put "?" behind the group (\[(.*?)\])?
  2. Why does it only match those 4 bracket groups if I don't put the "?" in?
  3. Why can't I put "{4}" behind the group in 1. to match it 4 times?

Here is a testcase:

https://regex101.com/r/cZ6rE3/1

  • 写回答

1条回答 默认 最新

  • duankuiyuant3940 2015-05-07 09:40
    关注

    This string, You gave is very specyfic, so maybe my regex won't match to every of its "brothers", but it can look like this:

    /\[(\w+)\s(\w+)\s(\d{2})\s(\d{2}):(\d{2}):(\d{2})\.\d+\s(\d{4})\]\s*\[([^\]]+)\]\s\[[^\]]+\]\s\[client\s(\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}:\d+)\][^\']+\'[^:]+:\s([^:]+):\s+(.*?)\sin\s(.*?)\son\sline\s(\d+)\
    (.*?)referer:\s(.*)/g
    

    https://regex101.com/r/dI9oO5/1

    To answer Your questions:

    1. * and + operators are "greedy", it means that by default they will match to as many characters they can. To change this behavior, You can add ?, So: .*? means: match to every character but stop as soon as You can (Don't be greedy.)

    2. The greed of * (without ?) makes it to consume more caracters than You want and not much is left for rest of the pattern.

    3. In described expected result You want to get every part of date in different varaible, so this is at least one reason why (\[(.*?)\]){4} won't work

    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论

报告相同问题?

悬赏问题

  • ¥15 微信公众号自制会员卡没有收款渠道啊
  • ¥15 stable diffusion
  • ¥100 Jenkins自动化部署—悬赏100元
  • ¥15 关于#python#的问题:求帮写python代码
  • ¥20 MATLAB画图图形出现上下震荡的线条
  • ¥15 关于#windows#的问题:怎么用WIN 11系统的电脑 克隆WIN NT3.51-4.0系统的硬盘
  • ¥15 perl MISA分析p3_in脚本出错
  • ¥15 k8s部署jupyterlab,jupyterlab保存不了文件
  • ¥15 ubuntu虚拟机打包apk错误
  • ¥199 rust编程架构设计的方案 有偿