doufuxi7093 2015-08-25 16:46
浏览 44
已采纳

PHP转换字符串而不使用正则表达式

Example probably works best.

  • a|b|c needs to become array('a', 'b', 'c')
  • a|\||\} needs to become array('a', '\|', '\}')
  • ab\}aaa|ae\|aa needs to become array('ab\}aaa', 'ae\|aa')

The string that's going to be transformed can have any type of characters, but there are 3 "special" characters that can be interpreted as a straightforward character, only if it is escaped with \. | separates an option but, if escaped, needs to be interpreted as an option or a part of it (like any other character). { and } are always going to be escaped at this point.

The catch is that I need to do this without using regular expressions.

I have been struggling with this one for 10 hours, and I sure hope anyone has a simple answer to this.

***Edit

My plan was to search for a | and if found, check if it is escaped. If yes, then continue searching for the next one. When I find |, I would take out the first option of the string, and continue the same way, until there were no | left.

while ($positionFound != 1) {
            $intPrevPosition = $intPosition;
            $intPosition = strpos($strTemp, '|', $intPosition);
            if ($intPosition === false || (substr_count($strTemp, '|') == 1 && $strTemp{$intPosition + $intPrevPosition - 1} == '\\')) {
                $arrOptions[] = $strTemp;
                $positionFound = 1;
            }
            elseif ($strTemp{$intPosition + $intPrevPosition - 1} != '\\') {
                $intPosition = $intPrevPosition + $intPosition;
                $arrOptions[] = substr(substr($strTemp, 0, $intPosition + 1), 0, -1);
                $strTemp = substr($strTemp, $intPosition + 1);
                $intPosition = 0;
            }
        }
  • 写回答

1条回答 默认 最新

  • douhu2898 2015-08-25 16:53
    关注

    Write a simple parser:

    $input = "ab\\}aaa|ae\\|aa"; // ab\}aaa|ae\|aa
    
    $token = "";
    $last_char = "";
    $len = strlen($input);
    $tokens = array();
    for ($i = 0; $i < $len; $i += 1) {
        $char = $input[$i];
        if ($char === "|" && $last_char !== "\\") {
            $tokens[] = $token;
            $token = "";
        }
        $token .= $char;
        $last_char = $char;
    }
    $tokens[] = $token; // capture last token
    var_dump($tokens);
    // array('ab\}aaa', 'ae\|aa')
    

    Note that with this implementation the escape also triggers on: ab\\|cd, the output is array("ab\\|cd") and not array("ab\\", "cd").


    Nested parser

    For easy of understanding I'm going to forget about the \ rules for now.

    Assume you have: a{b|c}|{d|e} and the expected output is: abd, abe, acd, ace

    First what you gotta do is translate a{b|c}|{d|e} into:

    array(
        "a",
        array("b", "c")
        array("d", "e")
    )
    

    If the input is ab{cd|ef}|{gh|ij} we want:

    array(
        "ab",
        array("cd", "ef")
        array("gh", "ij")
    )
    

    And ofcourse multiple levels of nesting should also work: a{b|{c|d}}|e

    array(
        "a",
        array("b", array("c", "d"))
        "e"
    )
    

    Here is the parse function. I hadn't quite figure out how to combine it back together yet

    function parse($string, $i = 0) {
        $token = "";
        $tokens = array();
        for (; $i < strlen($string); $i += 1) {
            $char = $string[$i];
            if ($char === "{") {
                if ($token !== "") {
                    $tokens[] = $token;
                }
                $token = "";
                $parse = parse($string, $i + 1);
                $tokens[] = $parse["token"];
                $i = $parse["index"];
                continue;
            }
            if ($char === "}") {
                // end of this part
                if ($token !== "") {
                    $tokens[] = $token;
                }
                return array(
                    "token" => $tokens,
                    "index" => $i
                );
            }
            if ($char === "|") {
                if ($token !== "") {
                    $tokens[] = $token;
                }
                $token = "";
                continue;
            }
            $token .= $char;
        }
        return $tokens;
    }
    var_dump(parse("ab{cd|ef}|{gh|ij}"));
    
    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论

报告相同问题?

悬赏问题

  • ¥20 java在应用程序里获取不到扬声器设备
  • ¥15 echarts动画效果的问题,请帮我添加一个动画。不要机器人回答。
  • ¥60 许可证msc licensing软件报错显示已有相同版本软件,但是下一步显示无法读取日志目录。
  • ¥15 Attention is all you need 的代码运行
  • ¥15 一个服务器已经有一个系统了如果用usb再装一个系统,原来的系统会被覆盖掉吗
  • ¥15 使用esm_msa1_t12_100M_UR50S蛋白质语言模型进行零样本预测时,终端显示出了sequence handled的进度条,但是并不出结果就自动终止回到命令提示行了是怎么回事:
  • ¥15 前置放大电路与功率放大电路相连放大倍数出现问题
  • ¥30 关于<main>标签页面跳转的问题
  • ¥80 部署运行web自动化项目
  • ¥15 腾讯云如何建立同一个项目中物模型之间的联系