doujiong2533 2013-06-10 10:50
浏览 64

将巨大的数据库从Latin1转换为UTF8?

We've got a system written in PHP which has grown pretty large over the years. The database is MySQL (InnoDB tables) with currently over 12GB of data, there's hundreds of tables, many of which have over 12 million records!

The problem is, a lot of the tables/columns (but not all) are in latin1, and we're (obviously) getting issues storing foreign characters.

What's the best way for us to convert all the tables/text columns into UTF8, with the shortest downtime possible?

The system is used by hundreds of people 24/7, so lengthy downtime is really not an option.

Is there any way of doing this successfully without a ton of downtime, and are there any obvious things we need to be careful of?

I know we'll need to set the following things to make our application use utf-8:

  • <meta http-equiv="content-type" content="text/html; charset=utf-8" />
  • ini_set('default_charset', 'utf-8');
  • SET NAMES utf8
  • Files encoded to UTF8 in our IDE before uploading the file

I've read various other posts, but everyone seems to be suggesting different methods, some saying it needs a full database dump/restore... which is going to mean hours and hours of downtime.

So what's the best way to do this?

  • 写回答

1条回答 默认 最新

  • doucheng7534 2013-06-10 11:42
    关注

    You're hoping to do something that is possible, but very hard, and risky as well. Give up on clever: there is nothing magical that makes this easy. You're trading off downtime on the one hand against your labor cost and the risk of data loss on the other hand. Your labor cost will probably be ten times higher than it would if you took the 15 hours downtime.

    Is it possible to write a SELECT query for every table that is guaranteed to retrieve every row that has been added or changed since a particular date/time, and do so quickly? If so, write this query for every table and keep it at hand. If not, you can't use this method.

    You can do this table by table.

    The small tables won't take much to do; you probably can do them while your application is live at off-peak hours. Just convert the columns.

    If you have never-updated larger tables, you can create copies of those tables with the desired charset and default collation for the columns in question. You can then copy the data with INSERT ... SELECT. (http://dev.mysql.com/doc/refman/5.1/en/insert-select.html) Finally, with a few moments of downtime you can rename the production table, then give then new table the production table name. You may need to do this in chunks of a few thousand rows to keep InnoDB's transaction integrity system from blowing out your server's RAM.

    Finally, you have to deal with your large and changing tables. Again, copy the tables with INSERT ... SELECT, again in a way where InnoDB transactions won't suspend your application's operation or blow out your RAM. The intention here is to have a snapshot of your table as of a certain date / time.

    Then, shut down production. Use your handy-dandy query to select all inserted and changed rows since your start time, and insert / update them into your target table. Then rename your production table, and give your target table the production table's name, and restart production. You should be able to do this fairly quickly.

    If I were you, I'd make a staging copy of your live production database and rehearse every step of this procedure before doing it live. You're going to encounter trouble with foreign-key constraints in InnoDB for example; you need to be able to work through those carefully.

    Actually, if I were you I wouldn't try to do this. It's like replacing an airplane's engine while the airplane is flying. What could go wrong? :-) Instead I'd incur the scheduled web application downtime to get this conversion done. Even in that case you should try out the whole thing on a staging server before going live.

    评论

报告相同问题?

悬赏问题

  • ¥15 树莓派与pix飞控通信
  • ¥15 自动转发微信群信息到另外一个微信群
  • ¥15 outlook无法配置成功
  • ¥30 这是哪个作者做的宝宝起名网站
  • ¥60 版本过低apk如何修改可以兼容新的安卓系统
  • ¥25 由IPR导致的DRIVER_POWER_STATE_FAILURE蓝屏
  • ¥50 有数据,怎么建立模型求影响全要素生产率的因素
  • ¥50 有数据,怎么用matlab求全要素生产率
  • ¥15 TI的insta-spin例程
  • ¥15 完成下列问题完成下列问题