I have quite a long script which involves chopping lots of large text files into individual words and processing them.
I lowercase everything then remove all characters except for letters and spaces with:
$content=preg_replace('/[^a-z\s]/', '', $content); // Remove non-letters
This is then exploded and each word goes into an associated array as the key with the number of occurances as the value:
$words=array_count_values($content);
I want to convert the script to be able to work with languages other than English. Is PHP going to be OK with this? Can I use UTF-8 characters as array keys? And how would I preg_replace to remove everything except letters from any language? (All numbers, punctuation and random characters still need to be removed.)