duanduji2986 2015-07-14 10:27
浏览 150
已采纳

PHP解析.ini文件问题换行/需要正则表达式?

I have some trouble with parsing .ini files which have values not enclosed by quotes and some newlines in it. Here is an example:

[Section1]
ID=xyz

# A comment
Foo=BAR

Description=Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod
 tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam,
 quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo
 consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse
 cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non
 proident, sunt in culpa qui officia deserunt mollit anim id est laborum.
Screenshot=url-goes-here.png
Categories=some,categories

Vendor=abc

[Section2]
Description=Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod
 tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam,
 quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo
 consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse
 cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non
 proident, sunt in culpa qui officia deserunt mollit anim id est laborum.

 Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod
 tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam,

 quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo
 consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse
 cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non
 proident, sunt in culpa qui officia deserunt mollit anim id est laborum.
Somekey=somevalue

When I try to parse this string with parse_ini_string($file_content, true, INI_SCANNER_RAW);, it returns either false or returns just the first line of Description. E. g.

["Description"]=> "Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod" // next lines are missing

I already tried to remove the newlines and enclose the values in quotes, but can´t find a regex that works. I need a pattern that matches each key/value until the next key/value or a until a comment begins.

Unfortunately sometimes the key begins after a blank line, sometimes not. And values can have blank lines in it (look at Description in Section2).

So the question is, how do I modify/cleanup this string to be readable with parse_ini_string?

  • 写回答

1条回答 默认 最新

  • doujiang1913 2015-07-14 10:57
    关注

    You can describe a multiline key/value with this pattern:

    /^\w+=\N*(?:\R++(?!\w+=|[[#;])\N+)+/m
    

    The INI_SCANNER_NORMAL default option allows multiline values enclosed between quotes, so all you need is to add quotes:

    $content = preg_replace('~^\w+=\K\N*(?:\R++(?!\w+=|[[#;])\N+)+~m', '"$0"', $content);
    

    pattern details:

    ~                  # pattern delimiter
    ^                  # start of the line
    \w+                # key name
    =
    \K                 # discards characters on the left from the match result
    \N*                # zero or more characters except newlines
    (?:                # non-capturing group: eventual empty lines until a non empty line
        \R++           # one or more newlines
        (?!\w+=|[[#;]) # not followed by another key/value, a section or a comment
        \N+            # one or more characters except newlines
    )+                 # at least one occurence
    ~m                 # switch on the multiline mode, ^ means "start of the line"
    

    This pattern targets only multiline values, other values stay unquoted.

    Notes: I assumed that each key, comment, section start at the beginning of a line. If it isn't the case with for example leading spaces, you can easily adapt the pattern adding \h*+ after each newline.

    If comments are allowed anywhere in a line, change \N to [^# ]


    If you want to use the INI_SCANNER_RAW option, you must remove newlines in values:

    $pattern = '~(?:\G(?!\A)|^\w+=[^#
    ]*)\K\R++(?!\w+=|[[#])([^#
    ]+)~';
    $content = preg_replace($pattern, ' $1', $content);
    

    The pattern matches groups of consecutive newlines character followed by a non empty line one by one and replace consecutive newlines with a space.

    An other way to do it is to use the first pattern but this time with preg_replace_callback to perform a simple character translation in the callback function. Note that this way may be interesting if you want to escape special or problematic characters.

    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论

报告相同问题?

悬赏问题

  • ¥15 求差集那个函数有问题,有无佬可以解决
  • ¥15 【提问】基于Invest的水源涵养
  • ¥20 微信网友居然可以通过vx号找到我绑的手机号
  • ¥15 寻一个支付宝扫码远程授权登录的软件助手app
  • ¥15 解riccati方程组
  • ¥15 display:none;样式在嵌套结构中的已设置了display样式的元素上不起作用?
  • ¥15 使用rabbitMQ 消息队列作为url源进行多线程爬取时,总有几个url没有处理的问题。
  • ¥15 Ubuntu在安装序列比对软件STAR时出现报错如何解决
  • ¥50 树莓派安卓APK系统签名
  • ¥65 汇编语言除法溢出问题