douyi7283 2012-11-20 18:30
浏览 85
已采纳

如何避免创建重复的行?

Everything I have searched for and found has yet to work because I am accessing the Table through a php script and differently than everything I see. Anyways, I am importing Feeds from a website into a mysql table. My table was created like this...

$query2 = <<<EOQ
CREATE TABLE IF NOT EXISTS `Entries` (
`feed_id` int(11) NOT NULL,
`item_title` varchar(200) COLLATE utf8_unicode_ci NOT NULL,
`item_link` varchar(200) COLLATE utf8_unicode_ci NOT NULL,
`item_date` varchar(40) COLLATE utf8_unicode_ci NOT NULL
) ENGINE=MyISAM DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci
EOQ;
$result = $db_obj->query($query2);

I enter the data like so....

foreach($rss->channel->item as $Item){
$query5 = <<<EOQ
INSERT INTO Entries (feed_id, item_title, item_link, item_date)
VALUES ('$get_id','$Item->title','$Item->link','$Item->pubDate')
EOQ;
$result = $db_obj->query($query5);
}

Now, every time Import new feeds from the site I want to make sure I delete any duplicates that might already be there. Everything I have tried, especially DISTINCT, has not worked for me. Does anyone know what type of query I could use to create a temp table, copy over any distinct rows (ENTIRE ROWS, if a title is the same but the date is different I want to keep that), drop the old table, then rename the tamp table to what I want.... or something similar?

  • 写回答

3条回答 默认 最新

  • dongzhangji4824 2012-11-20 18:34
    关注

    Avoid using the duplicate rows in the first place. Make any unique values into keys. When adding new values to your database, use

    REPLACE INTO Entries (feed_id, item_title, item_link, item_date)
    VALUES ('$get_id','$Item->title','$Item->link','$Item->pubDate')
    EOQ;
    

    The duplicates will be automatically overwritten. Replace is handy because it works like an insert when there is no conflict in the keys, but when there is then it will update the record and bump up any auto-incrementing keys.

    EDIT

    I've been drumming over this for a while. Here's what I came up with.

    The problem with making a multi-column key on (feed_id, item_title, item_link, item_date) is that it will exceed the 1000 byte limitation in MySQL for key length. So instead alter your schema like so:

    CREATE TABLE IF NOT EXISTS `Entries` (
    `hash` varchar(32),
    `feed_id` int(11) NOT NULL,
    `item_title` varchar(200) COLLATE utf8_unicode_ci NOT NULL,
    `item_link` varchar(200) COLLATE utf8_unicode_ci NOT NULL,
    `item_date` varchar(40) COLLATE utf8_unicode_ci NOT NULL,
     PRIMARY KEY (hash)
    ) ENGINE=MyISAM DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci
    

    Now when you store a new value, get a hash of the values together:

    $hash = md5($get_id . $Item->title . $Item->link . $Item->pubDate);
    

    And for your insert statements use the following:

    REPLACE INTO Entries (hash, feed_id, item_title, item_link, item_date)
    VALUES ('$hash', '$get_id','$Item->title','$Item->link','$Item->pubDate')
    EOQ;
    

    The hash will be a unique representation of the record in it's entirety, and will be easy to compare in order to avoid duplicates. Now when you attempt to add the same record more than once, it will just replace the existing entry, and your query will not fail. As an alternative, you could continue to use insert, and the query will return an error, which you could handle however you want to.

    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论
查看更多回答(2条)

报告相同问题?

悬赏问题

  • ¥60 求一个简单的网页(标签-安全|关键词-上传)
  • ¥35 lstm时间序列共享单车预测,loss值优化,参数优化算法
  • ¥15 基于卷积神经网络的声纹识别
  • ¥15 Python中的request,如何使用ssr节点,通过代理requests网页。本人在泰国,需要用大陆ip才能玩网页游戏,合法合规。
  • ¥100 为什么这个恒流源电路不能恒流?
  • ¥15 有偿求跨组件数据流路径图
  • ¥15 写一个方法checkPerson,入参实体类Person,出参布尔值
  • ¥15 我想咨询一下路面纹理三维点云数据处理的一些问题,上传的坐标文件里是怎么对无序点进行编号的,以及xy坐标在处理的时候是进行整体模型分片处理的吗
  • ¥15 一直显示正在等待HID—ISP
  • ¥15 Python turtle 画图