API
I would look into whether or not IMDB have an API available... If they do this will likely be as simple as querying a URL and parsing the data returned with json_decode
...
No API available?
Get the webpage
No need to use CURL a simple file_get_contents
will do the trick...
Extract the list
Now you have the web page you then have two options:
- Parse the web page with a DOM parser (long winded, not necessary)
- Regex to extract the info you're after (simple, short)
Regex
A quick look at the source code of the list shows the list is in the format:
<td class="titleColumn">RANK. <a href="/link/to/film" title="Director/Leads" >FILM TITLE</a>
See CAPS for required information
Now converting this into a regex is simple; just remove the noise and replace with (non-greedy) wild cards...
<td class="titleColumn">RANK. <a.*?>FILM TITLE</a>
Add your capture groups:
<td class="titleColumn">(RANK). <a.*?>(FILM TITLE)</a>
and that's it...
#<td class="titleColumn">(\d+)\. <a.*?>(.*?)</a>#
Example
Using this in practice:
$page = file_get_contents("http://www.imdb.com/chart/top"); //Download the page
preg_match_all('#<td class="titleColumn">(\d+)\. <a.*?>(.*?)</a>#', $page, $matches); //Match ranks and titles
$top250 = array_combine($matches[1], $matches[2]); //Final array in format RANK=>TITLE
Then you can do something like:
echo $top250[1];
/**
Output:
The Shawshank Redemption
*/
echo array_search("The Godfather", $top250);
/**
Output:
2
*/
You can then use standard PHP
array functions to do things like search for films.
http://php.net/file_get_contents
http://php.net/preg_match_all
http://php.net/array_combine
http://php.net/array_search
Side note
Especially if you use the No API method above you might like to think about storing the results locally and only updating every X Hours/Days/Weeks to save load times etc. I assume that you are already planning on doing this (as you said you wanted a personal movie data base... But just thought I'd mention it anyway!