I'm trying to optimize the following query. I'm thinking an outer join would do the trick, but I can't wrap my mind around how to put it together.
// ---------------------------------
// Simplified representation of data
// ---------------------------------
create table views (
user_id,
article_id
)
create table article_attributes (
article_id,
article_attribute_id
)
create table articles (
id,
title,
date
)
Views table has tens of millions of records. Articles table has a couple hundred thousand.
I'm trying to match all articles with a certain attribute associated with it, and that have not been viewed by a user.
What I have tried, but doesn't scale well:
select a.title, a.sid as article_id, a.total_views as times_read, a.date
from articles a
join article_attributes att on att.article_id = a.sid
where a.sid not in(
select v.article_id
from views v
join article_attributes att on att.article_id = v.article_id
where user_id = 132385
and att.article_attribute_id = 10
group by v.article_id
)
and att.article_attribute_id = 10
and a.date >= DATE_SUB(CURRENT_DATE(), INTERVAL 7 day)
order by total_views desc
limit 5
This works fine, but gets significantly slower the more articles the user has viewed. Any ideas or suggestions would be appreciated.