dongsou4301 2013-04-26 22:24
浏览 115
已采纳

MySQL子查询优化 - 不在(子查询)中

I'm trying to optimize the following query. I'm thinking an outer join would do the trick, but I can't wrap my mind around how to put it together.

// ---------------------------------
// Simplified representation of data
// ---------------------------------
create table views (
   user_id,
   article_id
)

create table article_attributes (
   article_id,
   article_attribute_id
)

create table articles (
   id,
   title,
   date
)

Views table has tens of millions of records. Articles table has a couple hundred thousand.

I'm trying to match all articles with a certain attribute associated with it, and that have not been viewed by a user.

What I have tried, but doesn't scale well:

select a.title, a.sid as article_id, a.total_views as times_read, a.date 
from articles a 
join article_attributes att on att.article_id = a.sid 

where a.sid not in( 
   select v.article_id 
   from views v
   join article_attributes att on att.article_id = v.article_id 
   where user_id = 132385 
   and att.article_attribute_id = 10
   group by v.article_id 
) 
and att.article_attribute_id = 10 
and a.date >= DATE_SUB(CURRENT_DATE(), INTERVAL 7 day) 
order by total_views desc 
limit 5

This works fine, but gets significantly slower the more articles the user has viewed. Any ideas or suggestions would be appreciated.

  • 写回答

4条回答 默认 最新

  • dongli8979 2013-04-27 15:15
    关注
    SELECT a.title, a.sid AS article_id, a.total_views AS times_read, a.date
    FROM articles a 
        JOIN article_attributes att 
            ON a.id = att.article_id AND att.article_attribute_id = 10 
        LEFT JOIN views v 
            ON a.id = v.article_id AND v.user_id = 132385  
    WHERE v.user_id IS NULL
    
    1. The first join gets you only the articles with the given attribute.
    2. The second join takes the first join's result and returns rows with the user_id and all the remaining rows from first result that don't have the user_id.(Basically ALL articles with attribute 132385 with the user_id being either 10 or NULL)
    3. Then all we want is that result where user_id is NULL

    Try to avoid nested queries and let the engine do it's job. Note you can tag on your other filters (DATE, ORDER BY) on the end.

    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论
查看更多回答(3条)

报告相同问题?

悬赏问题

  • ¥15 R语言Rstudio突然无法启动
  • ¥15 关于#matlab#的问题:提取2个图像的变量作为另外一个图像像元的移动量,计算新的位置创建新的图像并提取第二个图像的变量到新的图像
  • ¥15 改算法,照着压缩包里边,参考其他代码封装的格式 写到main函数里
  • ¥15 用windows做服务的同志有吗
  • ¥60 求一个简单的网页(标签-安全|关键词-上传)
  • ¥35 lstm时间序列共享单车预测,loss值优化,参数优化算法
  • ¥15 Python中的request,如何使用ssr节点,通过代理requests网页。本人在泰国,需要用大陆ip才能玩网页游戏,合法合规。
  • ¥100 为什么这个恒流源电路不能恒流?
  • ¥15 有偿求跨组件数据流路径图
  • ¥15 写一个方法checkPerson,入参实体类Person,出参布尔值