I'm working on an app which scrapes local websites to create a database of upcoming events, and I'm trying to use Regex to catch as many formats of dates as possible.
Consider the following sentence fragments:
- "The focus of the seminar, on Saturday 2nd February 2013 will be [...]"
- "Valentines Special @ The Radisson, Feb 14th"
- "On Friday the 15th of February, a special Hollywood themed [...]"
- "Symposium on Childhood Play on Friday, February 8th"
- "Hosting a craft workshop March 9th - 11th in the old [...]"
I want to be able to scan these and catch as many dates as possible. At the moment I'm doing this in what is probably a flawed way (I'm not great at regex), going through several regex statements one after the other, like this
/([0-9]+?)(st|nd|rd|th) (of)? (Jan|Feb|Mar|etc)/i
/([0-9]+?)(st|nd|rd|th) (of)? (January|February|March|Etcetera)/i
/(Jan|Feb|Mar|etc) ([0-9]+?)(st|nd|rd|th)/i
/(January|February|March|Etcetera) ([0-9]+?)(st|nd|rd|th)/i
I could merge these all into one giant regex statement, but it seems like there must be a cleaner way of doing this in php, maybe a third-party library or something?
EDIT: The regex above may have errors - it's only meant as an example.