dtvjl64442 2011-06-25 15:34
浏览 138
已采纳

如何在PHP中解析以空格分隔的字符串?

Part of the PHP application I'm building parses an RSS feed of upcoming jobs and internships. The <description> for each feed entry is a series of tags or labels containing four standard pieces of information:

  1. Internship or job
  2. Full or part time
  3. Type (one of 4 types: Local Gov, HR, Non-profit, Other)
  4. Name of organization

However, everything is space-delimited, turning each entry into a mess like this:

  • Internship Full time Local Gov NASA
  • Job Part time HR Deloitte
  • Job Full time Non-profit United Way

I'm trying to parse each line and use the pieces of the string as variables. this list were delimited in any standard way, I could easily use something like list($job, $time, $type, $name) = explode(",", $description) to parse the string and use the pieces individually.

I can't do that with this data, though. If I use explode(" ") I'll get lots of useless variables ("Full", "time", "Local", "Gov", for example).

Though the list isn't delimited, the first three pieces of information are standard and can only be one of 2–4 different options, essentially creating a dictionary of allowable terms (except the last one—the name of the organization—which is variable). Because of this it seems like I should be able to parse these strings, but I can't think of the best/cleanest/fastest way to do it.

preg_replace seems like it would require lots of messy regexes; a series of if/then statements (if the string contains "Local Gov" set $type to "Local Gov") seems tedious and would only capture the first three variables.

So, what's the most efficient way to parse a non-delimited string against a partial dictionary of allowed strings?

Update: I have no control over the structure of the incoming feed data. If I could I'd totally delimit this, but it's sadly not possible…

Update 2: To clarify, the first three options can only be the following:

  1. Internship | Job
  2. Full time | Part time
  3. Local Gov | HR | Non-profit | Other

That's the pseudo dictionary I'm talking about. I need to somehow strip those strings out of the main string and use what's left over as the organization name.

展开全部

  • 写回答

6条回答 默认 最新

  • doonbfez815298 2011-06-25 15:45
    关注

    It's just a matter of getting your hands dirty it seems:

    $input = 'Internship Full time Local Gov NASA';
    
    // Preconfigure known data here; these will end up
    // in the output array with the keys defined here
    $known_data = array(
        'job'  => array('Internship', 'Job'),
        'time' => array('Full time', 'Part time'),
        // add more known strings here
    );
    
    $parsed = array();
    foreach($known_data as $key => $options) {
        foreach($options as $option) {
            if(substr($input, 0, strlen($option)) == $option) {
                // Skip recognized token and next space
                $input = substr($input, strlen($option) + 1);
                $parsed[$key] = $option;
                break;
            }
        }
    }
    
    // Drop all remaining tokens into $parsed with numeric
    // keys; you could do something else with them if desired
    $parsed += explode(' ', $input);
    

    See it in action.

    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论
查看更多回答(5条)
编辑
预览

报告相同问题?

手机看
程序员都在用的中文IT技术交流社区

程序员都在用的中文IT技术交流社区

专业的中文 IT 技术社区,与千万技术人共成长

专业的中文 IT 技术社区,与千万技术人共成长

关注【CSDN】视频号,行业资讯、技术分享精彩不断,直播好礼送不停!

关注【CSDN】视频号,行业资讯、技术分享精彩不断,直播好礼送不停!

客服 返回
顶部