duanmao1975
duanmao1975
2012-12-10 15:30

用于分解配方列表元素的RegEx语法

已采纳

I am processing a list of recipe ingredients, an example of which looks like this:

Peanuts, Wheat Starch, Vegetable Oil, Modified Starch, Sugar, Mumbai Spice Flavour [Onion Powder, Herbs and Spices (Cumin, Curry Powder, Chilli Powder, Coriander), Garlic Powder, Potassium Chloride, Yeast Extract, Yeast Powder (contains Gluten and Barley), Citric Acid, Flavouring (contains Barley, Soya, Wheat, Celery)], Rice Flour, Salt, Colours (Concentrated Beetroot Juice, Curcumin, Paprika Extract).

I wish to explode each ingredient into an array (using PHP), seperated by commas. The problem I have is that some ingredients are sub-divided. In this example, the components of 'Mumbai Spice Flavour' are delimited by square brackets, and contains some ingredients, the sub-ingredients are which are then delimited by regular brackets.

A standard:

explode(",", $recipeStr) 

will give me a very messy result, so I'm looking for a Regular Expression statement that will explode each distinct element into an array, to take account of the optional square brackets, and optional sub-brackets. It also needs to be able to handle brackets that are not nested within square brackets.

The desired result would be an array list that looks like:

-Peanuts
-Wheat Starch
-Vegetable Oil
-Modified Starch
-Sugar
-Mumbai Spice Flavour [Onion Powder, Herbs and Spices (Cumin, Curry Powder, Chilli Powder, Coriander), Garlic Powder, Potassium Chloride, Yeast Extract, Yeast Powder (contains Gluten and Barley), Citric Acid, Flavouring (contains Barley, Soya, Wheat, Celery)]
-Rice Flour
-Salt
-Colours (Concentrated Beetroot Juice, Curcumin, Paprika Extract)

I am not very good at RegEx syntax, and so if any answer could also explain the syntax logic that would be greatly appreciated.

  • 点赞
  • 写回答
  • 关注问题
  • 收藏
  • 复制链接分享
  • 邀请回答

3条回答

  • dongxuan2015 dongxuan2015 9年前

    This seems to work (but maybe it's not the best solution) :)

    preg_match_all('/\w[\w\s-]*(?:\[.*?\]|\(.*?\))?/', $string, $matches);
    

    It's checking word character followed by 0 or more characters/spaces/dashes (add anything you want to capture to this group), then followed either by [...] or (...) or nothing (but brackets of the same type cannot be nested

    So you can have:

    - something
    - anything [...]
    - something different (...)
    
    点赞 评论 复制链接分享
  • dongqiang5932 dongqiang5932 9年前

    This regex seems to work on your example. You won't be able to explode but it does capture each item/group which you can then loop through

    ([\w+ ]+\[[^\]]+\]|[\w+ ]+\([^\)]+\)|[\w+ ]+)
    

    See demo here

    To break it down:

    (                      start capture group
    [\w+ ]+\[[^\]]+\]    match any words followed by [...]
    |                      or
    [\w+ ]+\([^\)]+\)    match any words followed by (...)
    |                      or
    [\w+ ]+              match any other words
    )                      end capture group
    
    点赞 评论 复制链接分享
  • duandao3265 duandao3265 9年前

    Ah, paranthesis-matching is not what a regular expression can easily do.

    Maybe you should simply go through the string character by character:

    $array = new Array();
    $temp = "";
    
    for($i = 0; $i < strlen($input); $i++)
    {
        $c = $input[$i];
        if($c == '(')
            $paranthesis++;
        if($c == '[')
            $bracket++;
    
        if($c == ')')
            $paranthesis--;
        if($c == ']')
            $bracket--;
        if($c == ',' && $paranthesis + $bracket == 0)
        {
            $array[] = $temp;
            $temp = "";
        }
        else
            $temp .= $c;
    }
    $array[] = $temp;
    

    I didn't test the code, but I hope it's clear what it is supposed to do.

    点赞 评论 复制链接分享