duanpasi6287 2018-07-29 17:23 采纳率: 100%
浏览 63
已采纳

正则表达式组选择

I'm stuck with log parsing. I've this rows in log file. Everything ends with line end

[2018.07.10 00:30:03:125] VersionInfo\886
[2018.07.10 00:30:03:109][TraceID: 8HRWSI105YVO91]->IncomingTime\16
[2018.07.10 00:30:03:109][TraceID: 8HRWSI105YVO91]->IncomingData\397
[2018.07.10 00:30:03:109][TraceID: 8HRWSI105YVO91]->ThreadID\8
[2018.07.10 00:30:03:109][TraceID: 8HRWSI105YVO91]->RequestExecuteStart\16
[2018.07.10 00:30:03:109][TraceID: 8HRWSI105YVO91]->RequestInfo\25
[2018.07.10 00:30:03:109][TraceID: 8HRWSI105YVO91]->CheckUserInfo\139
[2018.07.10 00:30:03:218]->Start RTS
[2018.07.10 00:30:03:640][TraceID: 8HRWSI105YVO91]->StartExecuteTask\35
[2018.07.10 00:30:03:749][TraceID: 8HRWSI105YVO91]->EndExecuteTask\36
[2018.07.10 00:30:03:749][TraceID: 8HRWSI105YVO91]->RequestExecuteEnd\16
[2018.07.10 00:30:03:749][TraceID: 8HRWSI105YVO91]->OutgoingData\26651

I want to parse each row in groups - time, traceid (if exists) and block name. To select datetime (which is always there) i use \[(.*?)\]. It's first group. Next must be traceid, if it exists. Get separator (?:\[|->| ) - [ or -> or . Group select is same as first \[(.*?)\]. And then goes third group with block name ([a-zA-Z ]+) - any text at the end without numbers.

I'm completely confused with how to connect it all. What i want to get is:

  • group 1 - datetime
  • group 2 - traceid | zero
  • group 3 - block name
  • 写回答

1条回答 默认 最新

  • doujianwan7570 2018-07-29 17:38
    关注

    This should do the trick: ^\[(.*?)\](?:\[(.*?)\])?->([a-zA-Z ]+). Make sure you're using the multi-line flag. Here's a Python demo:

    >>> for x in re.finditer(r'^\[(.*?)\](?:\[(.*?)\])?->([a-zA-Z ]+)', file, re.M):
        print(x.group(1), x.group(2), x.group(3))
    2018.07.10 00:30:03:109 TraceID: 8HRWSI105YVO91 IncomingTime
    2018.07.10 00:30:03:109 TraceID: 8HRWSI105YVO91 IncomingData
    2018.07.10 00:30:03:109 TraceID: 8HRWSI105YVO91 ThreadID
    2018.07.10 00:30:03:109 TraceID: 8HRWSI105YVO91 RequestExecuteStart
    2018.07.10 00:30:03:109 TraceID: 8HRWSI105YVO91 RequestInfo
    2018.07.10 00:30:03:109 TraceID: 8HRWSI105YVO91 CheckUserInfo
    2018.07.10 00:30:03:218 None Start RTS
    2018.07.10 00:30:03:640 TraceID: 8HRWSI105YVO91 StartExecuteTask
    2018.07.10 00:30:03:749 TraceID: 8HRWSI105YVO91 EndExecuteTask
    2018.07.10 00:30:03:749 TraceID: 8HRWSI105YVO91 RequestExecuteEnd
    2018.07.10 00:30:03:749 TraceID: 8HRWSI105YVO91 OutgoingData
    

    You could make it only give you the actual trace ID using ^\[(.*?)\](?:\[TraceID: (.*?)\])?->([a-zA-Z ]+).

    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论

报告相同问题?

悬赏问题

  • ¥15 基于卷积神经网络的声纹识别
  • ¥15 Python中的request,如何使用ssr节点,通过代理requests网页。本人在泰国,需要用大陆ip才能玩网页游戏,合法合规。
  • ¥100 为什么这个恒流源电路不能恒流?
  • ¥15 有偿求跨组件数据流路径图
  • ¥15 写一个方法checkPerson,入参实体类Person,出参布尔值
  • ¥15 我想咨询一下路面纹理三维点云数据处理的一些问题,上传的坐标文件里是怎么对无序点进行编号的,以及xy坐标在处理的时候是进行整体模型分片处理的吗
  • ¥15 CSAPPattacklab
  • ¥15 一直显示正在等待HID—ISP
  • ¥15 Python turtle 画图
  • ¥15 stm32开发clion时遇到的编译问题