2014-03-23 08:45

如何解析字符串 - 详细解释和语法信息


I would like to parse a sting of data in a shell script with a simple 1 line expression. But I do not know how or where to find any information describing how it is done. All the examples I can find just looks like an illegal math equations, and I can not find any documentation describing how it works.

First, what exactly is this form of parsing called so I know what I am talking about and what to search for. Secondly, where can I find what it all means so I can learn how to use it correctly and not just copy some one else's work with little understanding of how it works.

/\.(\w+)/*.[0-9]/'s/" /"

I recall learning about this in perl a couple decades ago, but have long since forgotten what it all means. I have spent days searching for information on what this all means. All I can find are specific examples with no explanations of what it is technically called and how it works!

I want to separate each field then extract the key name and numerical data in a shell script. I realize some forms of parsing are done differently in shell scripts as opposed to php or perl scripts. But I need to learn the parsing syntax used to filter out the specific data sets that I could use in both, shell and php.

Currently I need to parse a single line of data from a file in a shell script for a set of conditionals required by other support scripts.

Line=`cat ./dump.txt`
#Line = "V:12.46 A:3.427 AV:6.08 D:57.32 S:LOAD CT:45.00 P:42.71 AH:2016.80"

# for each field parse data  ("/[A-Z]:[0-9]/}" < $Line)
# $val[$1] = $2

# $val["V"] = "12.46"
# $val["AV"] = "6.08"

if $val["V"] < 11.4

if $val["AV"] > 10.7
echo $val["AV"] > ./source.txt
echo "DOWN" > ./source.txt

I need to identify and separate the difference between "V:" and "AV:".

In php I can use foreach & explode into an array. But I am tired of writing half a page of code for some thing that can be done in a single line. I need to learn a simpler and more efficient way to parse data from a string and extract the data in to a usable variable.

$Line = file_get_contents("./dump.txt");
$field = explode (' ' , $Line);
foreach($field as $arg)
$val = explode (':' , $arg);
$data[$val[0]] = $val[1];
# $data["V"] = "12.46"
# $data["AV"] = "6.08"

A quick shell example is much appreciated, but I really need to know "HOW TO" do this my self. Please give me some links or search criteria to find the definitions and syntax to these parsing expressions.

Thank you in advance for your help.

  • 点赞
  • 写回答
  • 关注问题
  • 收藏
  • 复制链接分享
  • 邀请回答


  • doumicheng6732 doumicheng6732 7年前

    The parsing patterns you're talking about are commonly referred to as regular expressions or regex.

    For php you can find a lot of helpful information from

    Regex is quite hard especially for complex expressions so I usually google search for an online regex expression tester. Preferably one which highlights whats being matched. Javascript ones are especially good as the results are instant and the regex syntax is the same for PHP.

    点赞 评论 复制链接分享
  • dongmao9217 dongmao9217 7年前

    Special thanks to James T for leading me in the right direction.

    After reading through the regular expressions I have figured out the search pattern I need. Also included is a brief script to test the output. Taking into account that BASH can not use decimal numbers we need to convert it to a whole number. The decimal intigers is always fixed at 2 or 3 places so conversion is easy, just drop the decimal. Also the order in which the fields are recorded remains constant so the order in which they are read will remain the same.

    The regular expression that fits the search for each of the first 4 fields is:

    ( ) = the items to search/parse; using 2 searches for each data set "V:12.46"
    \w = for the word search and the " + " means any 1 or more letters
    : = for the delimiter
    (  -search set 1:
      [0-9] = search any numbers and the " + " means any 1 or more digits
    ) -end search set 1
    \. = for the decimal point in the data
    (  -search set 2:
      [0-9] = search any numbers and the " + " means any 1 or more ( second set after the decimal)
    ) -end search set 2
    \s = white space (blank space)

    Now duplicate the search 3 times for the first 3 fields, giving me 6 variables.


    And here is a simple script to test the output:

    Line="V:13.53 A:7.990 AV:13.65 D:100.00 S:BulkCharge CT:35.00 P:108.11 AH:2116.20"
    if [[ $Line =~ $regex ]]; then
            echo "match found in $Line"
            while [[ $i -lt $n ]]
                echo "  capture[$i]: ${BASH_REMATCH[$i]}"
                let i++
        echo "$Line does not match"
    if [ $Volt -gt 1200 ]
        echo "Voltage is $Volt"

    resulting with an output of:

    match found in V:13.53 A:7.990 AV:13.65 D:100.00 S:BulkCharge CT:35.00 P:108.11 AH:2116.20
      capture[1]: 13
      capture[2]: 53
      capture[3]: 7
      capture[4]: 990
      capture[5]: 13
      capture[6]: 65
    Voltage is 1353
    点赞 评论 复制链接分享