I'm using joomla 3.4 and I have installed the sh404 component for sef urls. My database is utf8_general_ci and many of my urls are url encoded from greek, which result in some really long urls.
Today I noticed that some urls were broken, so after investigating, I realised that sh404 is storing the urls in VARCHAR(255) columns. Because of that, some urls (not many, but enough for me to notice out of 200k urls) are longer than what the column allows and are being stored incomplete. Apart from the obvious problem of the broken urls and the expected 500 server errors etc, this results in the same broken url to be stored repeatedly, since sh404 looks if it is stored before storing it and doesn't find it (as the stored one is broken). In one case I had the same url stored over 5000 times......
Now my question to you is this. Should I just increase the VARCHAR length to 500 (should be enough to hold any URL, even my most extreme cases and with 40 - 50 to spare), or should i change the logic of the component I use that generates these urls, by transliterating them, thus reducing the size of ALL of my urls to more than half? The downside to this is that I will still have to output all of the necessary Info in Greek and in order to do so I will have to execute a lot of str_replaces with php.
My urls are beng generated by a combination of some db fields that contain product info in Greek. Sh404 picks up the Greek urls and transliterates them to english, but the original url gets too long. If I change all of the fields to contain the same info, already transliterated then sh404 will store urls already in english, but db fields will also be in english, thus the str_replaces. The good thing is that these values are known and fixed, so I won't be needing any preg_replaces.
What do you think is better performance wise?
Ofcourse changing VARCHAR to 500 is a lot more simple, but I don't care taking the long road if it is gonna benefit my performance