dsai1991 2013-08-08 16:43
浏览 70
已采纳

preg_match_all with git log --pretty = raw --all

I'm trying to use preg_match_all to process the log from git log --pretty=raw --all.

The sample data I got is something like this:

commit 5650c7841f72c4c65689b0d4bc83ccd70e5b2362 (HEAD, origin/master, origin/HEAD, master)
tree 69c6036c64c805e9c335b2eadd87b43af90ee1ad
parent a912fdd530efe69dae4b0f417c8a8631d68f469c
parent 113e128efe54511f2b0bdd589301ffe039fc185e
author Author Name 3 <author.name.3@gmail.com> 1371835063 -0700
committer Committer Name 3 <committer.name.3@gmail.com> 1371835063 -0700

    Merge pull request #60 from sample/master

    Line 2 message

commit 94e99889226671dc479be770968df2692e09db11 (origin/fixit)
tree f900c172fa633b3769b982614ce639e3ee6f3b62
parent dc56687f1597b317064b0d899c2450fb6805791e
author Author Name 2 <author.name.2@something.com.tld> 1370944188 +0300
committer Committer Name 2 <committer.name.2@something.com.tld> 1370944188 +0300

    1 line message

commit dc56687f1597b317064b0d899c2450fb6805791e
tree cb1573ccde7ddcb2e54b9b9a777e11a435d532ac
parent a912fdd530efe69dae4b0f417c8a8631d68f469c
author Author Name 1 <author.name.1@gmail.com> 1370640640 +0300
committer Committer Name 1 <committer.name.1@gmail.com> 1370943413 +0300

    Message contain words like commit tree parent author committer to screw your regex.
    Also contain other symbols like @ # ! % ( ) = - |

    Can you handle 2nd paragraph?

    3rd paragraph?

`

I would like to extract the

  • commit hash
  • tree hash
  • parents hash
  • author name
  • commiter, and
  • commit message

The closest I can get is with:

/^commit (.{40})(.*)\s^tree (.{40})\s^parent (.{40})\s(^parent (.{40})\s)?^author (.+)\s^committer (.+)\s+(.+)\s+/m

Which come out something like this: http://regex101.com/r/cY4qV4

Is there more accurate regex for above data that don't break easily like mine?

Something to take note:

  1. tags, branches in () after commit (additional question: Is it possible to separate tag/branch by comma (,) also within 1 regex?)
  2. some commit has 2 parents
  3. commit message may contains multiple paragraph, weird symbols or words that you use in regex
  • 写回答

1条回答 默认 最新

  • dongyan7950 2013-08-08 17:05
    关注

    Instead of preg_match_all on raw output I would suggest using --pretty=format: and make parseable XML:

    $out = shell_exec(
        'git log --max-count=10 --pretty=format:'.
        '"<commit><hash>%H</hash><date>%ad</date><msg><![CDATA[%s]]></msg></commit>"'
    );
    $commits = simplexml_load_string(
        '<?xml version="1.0" encoding="utf-8"?><commits>'.$out.'</commits>'
    );
    

    All available options can be found on http://git-scm.com/book/en/Git-Basics-Viewing-the-Commit-History (table in the middle). Remember to use <![CDATA[ ]]> around messages, names and emails, since some of them may contain chars that would break XML.

    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论

报告相同问题?

悬赏问题

  • ¥15 如何在scanpy上做差异基因和通路富集?
  • ¥20 关于#硬件工程#的问题,请各位专家解答!
  • ¥15 关于#matlab#的问题:期望的系统闭环传递函数为G(s)=wn^2/s^2+2¢wn+wn^2阻尼系数¢=0.707,使系统具有较小的超调量
  • ¥15 FLUENT如何实现在堆积颗粒的上表面加载高斯热源
  • ¥30 截图中的mathematics程序转换成matlab
  • ¥15 动力学代码报错,维度不匹配
  • ¥15 Power query添加列问题
  • ¥50 Kubernetes&Fission&Eleasticsearch
  • ¥15 報錯:Person is not mapped,如何解決?
  • ¥15 c++头文件不能识别CDialog