dongtanghuan1885 2015-03-24 13:58
浏览 68
已采纳

PHP从字符串中获取数据

I have asked several questions regarding this and I have tried many different things, but I am not completely happy with it. I have a lot of data in the following format

3*O#AA6160 F7 A7 P7 J7 R7 D7 I7 Y7 LHRMIA 1040 1455   *  744 0E
        B7 H0 W0 K0 M0 L0 V0 G0 S0 Q0 N0 O0 

The spaces you see on the second row are there by default. Essentially, from that string I am trying to get the following

$flightNumber = AA6160;
$from = LHR;
$to = MIA;
$other = 1040 1455   *  744 0E;
$seats = array(
    "F" => 7,
    "A" => 7,
    "P" => 7,
    "J" => 7,
    "R" => 7,
    "D" => 7,
    "I" => 7,
    "Y" => 7,
    "B" => 7,
    "H" => 0,
    "W" => 0,
    "K" => 0,
    "M" => 0,
    "L" => 0,
    "V" => 0,
    "G" => 0,
    "S" => 0,
    "Q" => 0,
    "O" => 0,
)

The rules are as follows.
The start of a row starts with a digit (in the above case 3). The second row is a continuation of seats from the first row. If I was to post the full data I have, the third row starts with 4 which means that its not related to the two above.

A flight number always starts with a # and is following by TWO Letters and 1-4 numbers. Sometimes there is spaces between the letters and numbers. These are all the types of flight numbers I have discovered

#AA6160
#AA  57
#BA 207

The second row will only contain a continuation of seats, nothing else. This is what I have come up with so far

while ( $elNum < $elements->length ) {

    $flightInfo = $elements->item($elNum)->nodeValue;

    if (preg_match('/^\\d/', $flightInfo) === 1) {
        if(preg_match('/(\d)+[^#]*?\#(\p{Lu}{2})\s*(\d{1,4})\b\s*([\w. ]+?)(?=\s+\p{Lu}{6})\s([A-Z]{3})([A-Z]{3})(.+)/', $flightInfo, $matches)===1){
            $row = $matches[1];
            $fltcode = $matches[2].$matches[3];
            $ffrom = $matches[5];
            $fto = $matches[6];
            $other = $matches[7];

            $this->flights[$fltcode] = array(
                "command" => $terminal_command,
                "row" => $row,
                "flightNumber" => $fltcode,
                "from" => $ffrom,
                "to" => $fto,
                "other" => $other
            );
        }
    }
    ++$elNum;
}

The main thing I am struggling with is the seats. I am not sure how to get the ones I need from the first row and combine them with the ones from the second row in the output format I need them all to be.

I am not even sure if regex is the best option here, or if I should explode everything on spaces and sort them like this?

Any advice on the situation is appreciated. Here is some additional data

5*S#DL4386 J9 C9 D9 I9 Z9 W9 Y9 B9 LHRMIA 1235 1705   *  744 0E
        M9 S9 H9 Q9 K9 L9 U9 T9 X9 V9 
6  #VS   5 J9 C9 D9 I9 Z9 W9 S9 H9 LHRMIA 1235 1705      744 0E
        K9 Y9 B9 R9 L9 U9 M9 E9 Q9 X9 N9 O9 
7  #IB4637 F9 A9 J9 C9 D9 R9 I. W9 LHRMIA 1415 1825   *  744 0E
        Z. Y9 B9 H9 K. M. L. V. S. N. Q. O.

Thanks

  • 写回答

2条回答 默认 最新

  • douju5933 2015-03-24 16:48
    关注

    An example that uses XMLReader instead of DOMDocument for the xml parsing, because it is faster and use less memory. The patterns are designed to be more readable (with the free-spacing mode and the named captures) and efficient (with anchors, without useless unicode character classes like \p{Lu}, lookaheads, or unused capturing groups).

    $xml = <<<EOD
    <?xml version="1.0" encoding="utf-8" ?>
    <root xmlns:terminal="http://test.com/terminal">
        <terminal:Text>1  #AY5767 F9 A9 P. J9 C9 D9 I9 Y9 LHRMIA 0945 1410   *  777 0E</terminal:Text>
        <terminal:Text>        B9 H9 K9 M9 L9 V9 S9 N9 Q9 O9 G9 </terminal:Text>
    
        <otherthings>blah blah blah</otherthings>
    
        <terminal:Text>2  #AY5768 F9 A9 P. J9 C9 D9 I9 Y9 ROMMIL 0945 1410   *  777 0E</terminal:Text>
        <terminal:Text>        B9 H9 K9 M9 L9 V9 S9 N9 Q9 O9 G9 </terminal:Text>
        <terminal:Text>        E8 G8 R8 S8 T4 U2 </terminal:Text>
    </root>
    EOD;
    
    $patternFirstLine = <<<EOD
    ~
    \A
        [0-9]+ \s+
        \# (?<code1> [A-Z]{2} ) \s* (?<code2> [0-9]{1,4} ) \s+ 
        (?<seat1> [A-Z][0-9.] (?: \s+ [A-Z][0-9.] )*+ ) \s+
        (?<from> [A-Z]{3} ) (?<to> [A-Z]{3} ) \s+
        (?<other> .*\S ) \s*
    \z
    ~x
    EOD;
    
    $patternNextLines = <<<EOD
    ~
    \A \s*
        (?<seatN> [A-Z][0-9.] (?: \s+ [A-Z][0-9.] )*+ )
    \s* \z
    ~x
    EOD;
    
    $parser = new XMLReader();
    
    $parser->xml($xml);
    
    $temp = false;
    $results = [];
    
    while($parser->read()) {
        while ($parser->name === 'terminal:Text') {
            if (preg_match($patternFirstLine, $parser->readInnerXML(), $m)) {
                if ($temp) $results[] = $temp;
    
                $temp = [
                    "flightNumber" => $m['code1'] . $m['code2'],
                    "from"         => $m['from'],
                    "to"           => $m['to'],
                    "seats"        => $m['seat1'],
                    "other"        => $m['other']
                ];
            } elseif ($temp && preg_match($patternNextLines, $parser->readInnerXML(), $m))
                $temp['seats'] .= ' ' . $m['seatN'];
            else
                $temp = false;
    
            $parser->next('Text');
    
        }
    }
    
    if ($temp) $results[] = $temp;
    
    $results = array_map(function ($i) {
        $seats = explode(' ',$i['seats']);
        $i['seats'] = [];
        foreach ($seats as $seat)
            $i['seats'][$seat[0]] = $seat[1];
    
        return $i;
    }, $results);
    
    print_r($results);
    

    Note: in this example, I use XMLReader::xml() to load the xml content, but one of the main interest of XMLReader is to use the XMLReader::open method with the xml uri.

    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论
查看更多回答(1条)

报告相同问题?

悬赏问题

  • ¥30 这是哪个作者做的宝宝起名网站
  • ¥60 版本过低apk如何修改可以兼容新的安卓系统
  • ¥25 由IPR导致的DRIVER_POWER_STATE_FAILURE蓝屏
  • ¥50 有数据,怎么建立模型求影响全要素生产率的因素
  • ¥50 有数据,怎么用matlab求全要素生产率
  • ¥15 TI的insta-spin例程
  • ¥15 完成下列问题完成下列问题
  • ¥15 C#算法问题, 不知道怎么处理这个数据的转换
  • ¥15 YoloV5 第三方库的版本对照问题
  • ¥15 请完成下列相关问题!