Part of the PHP application I'm building parses an RSS feed of upcoming jobs and internships. The <description>
for each feed entry is a series of tags or labels containing four standard pieces of information:
- Internship or job
- Full or part time
- Type (one of 4 types: Local Gov, HR, Non-profit, Other)
- Name of organization
However, everything is space-delimited, turning each entry into a mess like this:
- Internship Full time Local Gov NASA
- Job Part time HR Deloitte
- Job Full time Non-profit United Way
I'm trying to parse each line and use the pieces of the string as variables. this list were delimited in any standard way, I could easily use something like list($job, $time, $type, $name) = explode(",", $description)
to parse the string and use the pieces individually.
I can't do that with this data, though. If I use explode(" ")
I'll get lots of useless variables ("Full", "time", "Local", "Gov", for example).
Though the list isn't delimited, the first three pieces of information are standard and can only be one of 2–4 different options, essentially creating a dictionary of allowable terms (except the last one—the name of the organization—which is variable). Because of this it seems like I should be able to parse these strings, but I can't think of the best/cleanest/fastest way to do it.
preg_replace
seems like it would require lots of messy regexes; a series of if/then statements (if the string contains "Local Gov" set $type
to "Local Gov") seems tedious and would only capture the first three variables.
So, what's the most efficient way to parse a non-delimited string against a partial dictionary of allowed strings?
Update: I have no control over the structure of the incoming feed data. If I could I'd totally delimit this, but it's sadly not possible…
Update 2: To clarify, the first three options can only be the following:
- Internship | Job
- Full time | Part time
- Local Gov | HR | Non-profit | Other
That's the pseudo dictionary I'm talking about. I need to somehow strip those strings out of the main string and use what's left over as the organization name.