Background:
I have a MySQL database with a table that has VARCHAR fields which have always been latin1; I can use a PHP/web interface to copy/paste Unicode chars that came from UTF8 and the added record can be retrieved and looks fine. When I use a C CGI interface to create the record, you see the sequence of bytes instead. Looking at the difference I see PHP based entered something like 芳
for the characters (sequence of them for each characters) where as C version just output the sequence as bytes/chars.
Question:
I'd like to keep the database in latin1 so existing data is fine, but when things come in UTF8 format, if no laten1 translation exists for a character, use UTF8. I can detect that in C, but what byte-order does the &#
format use and where do I find information about it? My plan would be to detect UTF-8 chars and create a &#
format to populate the fields. Are there any downfalls to doing this?
TIA!!