I have a table of words used in the title of articles. I want to find which words which are used the least in the set or article titles.
Example:
Titles:
"Congressman Joey of Texas does not sign bill C1234."
"The pretty blue bird flies at night in Texas."
"Congressman Bob of Arizona is the signs bill C1234."
The table would contain the following.
Table WORDS_LIST
----------------------------------------------------
| INDEX ID | WORD | ARTICLE ID |
----------------------------------------------------
| 1 | CONGRESSMAN | 1234 |
| 2 | JOEY | 1234 |
| 3 | SIGN | 1234 |
| 4 | BILL | 1234 |
| 5 | C1234 | 1234 |
| 6 | TEXAS | 1234 |
| 7 | PRETTY | 1235 |
| 8 | BLUE | 1245 |
| 9 | BIRD | 1245 |
| 10 | FLIES | 1245 |
| 11 | NIGHT | 1245 |
| 12 | TEXAS | 1245 |
| 13 | CONGRESSMAN | 1246 |
| 14 | BOB | 1246 |
| 15 | ARIZONA | 1246 |
| 16 | SIGNS | 1246 |
| 17 | BILL | 1246 |
| 18 | C1234 | 1246 |
----------------------------------------------------
In this case, the words "pretty,blue, flies, night" would be the used in the least number of articles.
I would appreciate any ideas on how to best create this query. So far below is what I started with. I can also write something in PHP but figured a query would be faster.
SELECT distinct a1.`word`, count(a1.`word`)
FROM mmdb.words_list a1
JOIN mmdb.words_list b1
ON a1.id = b1.id AND
upper(a1.word) = upper(b1.word)
where date(a1.`publish_date`) = '2017-06-09'
group by `word`
order by count(a1.`word`);