dtvjl64442 2011-06-25 23:34
浏览 138
已采纳

如何在PHP中解析以空格分隔的字符串?

Part of the PHP application I'm building parses an RSS feed of upcoming jobs and internships. The <description> for each feed entry is a series of tags or labels containing four standard pieces of information:

  1. Internship or job
  2. Full or part time
  3. Type (one of 4 types: Local Gov, HR, Non-profit, Other)
  4. Name of organization

However, everything is space-delimited, turning each entry into a mess like this:

  • Internship Full time Local Gov NASA
  • Job Part time HR Deloitte
  • Job Full time Non-profit United Way

I'm trying to parse each line and use the pieces of the string as variables. this list were delimited in any standard way, I could easily use something like list($job, $time, $type, $name) = explode(",", $description) to parse the string and use the pieces individually.

I can't do that with this data, though. If I use explode(" ") I'll get lots of useless variables ("Full", "time", "Local", "Gov", for example).

Though the list isn't delimited, the first three pieces of information are standard and can only be one of 2–4 different options, essentially creating a dictionary of allowable terms (except the last one—the name of the organization—which is variable). Because of this it seems like I should be able to parse these strings, but I can't think of the best/cleanest/fastest way to do it.

preg_replace seems like it would require lots of messy regexes; a series of if/then statements (if the string contains "Local Gov" set $type to "Local Gov") seems tedious and would only capture the first three variables.

So, what's the most efficient way to parse a non-delimited string against a partial dictionary of allowed strings?

Update: I have no control over the structure of the incoming feed data. If I could I'd totally delimit this, but it's sadly not possible…

Update 2: To clarify, the first three options can only be the following:

  1. Internship | Job
  2. Full time | Part time
  3. Local Gov | HR | Non-profit | Other

That's the pseudo dictionary I'm talking about. I need to somehow strip those strings out of the main string and use what's left over as the organization name.

  • 写回答

6条回答 默认 最新

  • doonbfez815298 2011-06-25 23:45
    关注

    It's just a matter of getting your hands dirty it seems:

    $input = 'Internship Full time Local Gov NASA';
    
    // Preconfigure known data here; these will end up
    // in the output array with the keys defined here
    $known_data = array(
        'job'  => array('Internship', 'Job'),
        'time' => array('Full time', 'Part time'),
        // add more known strings here
    );
    
    $parsed = array();
    foreach($known_data as $key => $options) {
        foreach($options as $option) {
            if(substr($input, 0, strlen($option)) == $option) {
                // Skip recognized token and next space
                $input = substr($input, strlen($option) + 1);
                $parsed[$key] = $option;
                break;
            }
        }
    }
    
    // Drop all remaining tokens into $parsed with numeric
    // keys; you could do something else with them if desired
    $parsed += explode(' ', $input);
    

    See it in action.

    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论
查看更多回答(5条)

报告相同问题?

悬赏问题

  • ¥15 用windows做服务的同志有吗
  • ¥60 求一个简单的网页(标签-安全|关键词-上传)
  • ¥35 lstm时间序列共享单车预测,loss值优化,参数优化算法
  • ¥15 Python中的request,如何使用ssr节点,通过代理requests网页。本人在泰国,需要用大陆ip才能玩网页游戏,合法合规。
  • ¥100 为什么这个恒流源电路不能恒流?
  • ¥15 有偿求跨组件数据流路径图
  • ¥15 写一个方法checkPerson,入参实体类Person,出参布尔值
  • ¥15 我想咨询一下路面纹理三维点云数据处理的一些问题,上传的坐标文件里是怎么对无序点进行编号的,以及xy坐标在处理的时候是进行整体模型分片处理的吗
  • ¥15 一直显示正在等待HID—ISP
  • ¥15 Python turtle 画图