duanduji2986 2015-07-14 10:27
浏览 150
已采纳

PHP解析.ini文件问题换行/需要正则表达式?

I have some trouble with parsing .ini files which have values not enclosed by quotes and some newlines in it. Here is an example:

[Section1]
ID=xyz

# A comment
Foo=BAR

Description=Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod
 tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam,
 quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo
 consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse
 cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non
 proident, sunt in culpa qui officia deserunt mollit anim id est laborum.
Screenshot=url-goes-here.png
Categories=some,categories

Vendor=abc

[Section2]
Description=Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod
 tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam,
 quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo
 consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse
 cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non
 proident, sunt in culpa qui officia deserunt mollit anim id est laborum.

 Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod
 tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam,

 quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo
 consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse
 cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non
 proident, sunt in culpa qui officia deserunt mollit anim id est laborum.
Somekey=somevalue

When I try to parse this string with parse_ini_string($file_content, true, INI_SCANNER_RAW);, it returns either false or returns just the first line of Description. E. g.

["Description"]=> "Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod" // next lines are missing

I already tried to remove the newlines and enclose the values in quotes, but can´t find a regex that works. I need a pattern that matches each key/value until the next key/value or a until a comment begins.

Unfortunately sometimes the key begins after a blank line, sometimes not. And values can have blank lines in it (look at Description in Section2).

So the question is, how do I modify/cleanup this string to be readable with parse_ini_string?

  • 写回答

1条回答 默认 最新

  • doujiang1913 2015-07-14 10:57
    关注

    You can describe a multiline key/value with this pattern:

    /^\w+=\N*(?:\R++(?!\w+=|[[#;])\N+)+/m
    

    The INI_SCANNER_NORMAL default option allows multiline values enclosed between quotes, so all you need is to add quotes:

    $content = preg_replace('~^\w+=\K\N*(?:\R++(?!\w+=|[[#;])\N+)+~m', '"$0"', $content);
    

    pattern details:

    ~                  # pattern delimiter
    ^                  # start of the line
    \w+                # key name
    =
    \K                 # discards characters on the left from the match result
    \N*                # zero or more characters except newlines
    (?:                # non-capturing group: eventual empty lines until a non empty line
        \R++           # one or more newlines
        (?!\w+=|[[#;]) # not followed by another key/value, a section or a comment
        \N+            # one or more characters except newlines
    )+                 # at least one occurence
    ~m                 # switch on the multiline mode, ^ means "start of the line"
    

    This pattern targets only multiline values, other values stay unquoted.

    Notes: I assumed that each key, comment, section start at the beginning of a line. If it isn't the case with for example leading spaces, you can easily adapt the pattern adding \h*+ after each newline.

    If comments are allowed anywhere in a line, change \N to [^# ]


    If you want to use the INI_SCANNER_RAW option, you must remove newlines in values:

    $pattern = '~(?:\G(?!\A)|^\w+=[^#
    ]*)\K\R++(?!\w+=|[[#])([^#
    ]+)~';
    $content = preg_replace($pattern, ' $1', $content);
    

    The pattern matches groups of consecutive newlines character followed by a non empty line one by one and replace consecutive newlines with a space.

    An other way to do it is to use the first pattern but this time with preg_replace_callback to perform a simple character translation in the callback function. Note that this way may be interesting if you want to escape special or problematic characters.

    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论

报告相同问题?

悬赏问题

  • ¥15 apm2.8飞控罗盘bad health,加速度计校准失败
  • ¥15 求解O-S方程的特征值问题给出边界层布拉休斯平行流的中性曲线
  • ¥15 谁有desed数据集呀
  • ¥20 手写数字识别运行c仿真时,程序报错错误代码sim211-100
  • ¥15 关于#hadoop#的问题
  • ¥15 (标签-Python|关键词-socket)
  • ¥15 keil里为什么main.c定义的函数在it.c调用不了
  • ¥50 切换TabTip键盘的输入法
  • ¥15 可否在不同线程中调用封装数据库操作的类
  • ¥15 微带串馈天线阵列每个阵元宽度计算