douyong1974 2017-06-22 16:48
浏览 91
已采纳

如何使用MySQL中的关系将文本字符串批量转换为唯一ID

I have a movie database I am working on and before I start working on the php side I want to make sure the database is solid.

As I have mass imported data scraped from the web currently my results in the columns for genre and actors are in text strings. I want to convert them to unique ids and create relationship tables.

Essentially how it is now:

Movie Table

Movie ID - Movie name - Genres - Actors
1        - Inception  - Sci Fi - Leonardo Di Caprio, Ellen Page

How I want it:

Movie Table

Movie ID - Movie Name
1        - Inception

Genre Table

Genre ID - Genre Name
1        - Sci Fi

Actor Table

Actor ID - Actor Name
1        - Leonardo Di Caprio
2        - Ellen Page

Genre Relationships Table

Movie ID - Genre ID
1        - 1

Actor Relationships Table

Movie ID - Actor ID
1        - 1
1        - 2

If it was just the genres then I could do this by hand but as there are thousands of movies and actors I am struggling to come up with a simple approach to convert all this data.

I have a csv dump of all the data and figure it could be done by using a php script to import it in this format or I don't know if it's possible to run SQL commands to sort the data this way (the database has over 200,000 movies).

Any hints or ideas on how to accomplish this would be much appreciated!

  • 写回答

1条回答 默认 最新

  • dongsheng66783619 2017-06-22 18:22
    关注

    Something like this will sort of work:

    For each record
        Do 
           Select from genre table using genre string to get genre ID
           If select did not return ID, INSERT new genre string to add new genre ID
        While Select did not return ID
        For each actor
           Do 
             Select from genre table using actor string to get actor ID
             If select did not return ID, INSERT new actor string to add new actor ID
           While Select did not return ID
      ...
    

    But there will be problems:

    • Movies with the same name
    • Different spellings of genre names (sf, sci fi, science fiction)
    • Different spellings of actor names. You'll see in IMDB actors might be Mike or Michael, with or without a middle initial, etc. and women might use their married name in some movies but not others
    • Actors with the same name

    To fix that you'd need access to some existing database where you can get the same ID code for any variation of an actor's name, for a genre name, and for movies when supplying an actor list.

    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论

报告相同问题?

悬赏问题

  • ¥15 请教一下各位,为什么我这个没有实现模拟点击
  • ¥15 执行 virtuoso 命令后,界面没有,cadence 启动不起来
  • ¥50 comfyui下连接animatediff节点生成视频质量非常差的原因
  • ¥20 有关区间dp的问题求解
  • ¥15 多电路系统共用电源的串扰问题
  • ¥15 slam rangenet++配置
  • ¥15 有没有研究水声通信方面的帮我改俩matlab代码
  • ¥15 ubuntu子系统密码忘记
  • ¥15 保护模式-系统加载-段寄存器
  • ¥15 电脑桌面设定一个区域禁止鼠标操作