Hello there and warning for wall of text :)
I am about to build a site that scrape and collects instagram photos belonging to five combos of two hashtags. The first hashtag will be the same and are the name of the site/campaign, the other hashtag till be one of five topics.
This also need to sort under the instagram username so that each user can "collect" images of all five topics.
This then needs to be presented as a "toplist" sorted by "number of images DESC, combined likes DESC". Where one image from each topic, in other words, five images are max for each user.
Kind of hard to explain, i'll try to illustrate it by this exampel of the toplist i need to build:
TOPLIST:
Rank 1.
USERNAME - score 27 (has collected all 5 topics and have most combined likes)
(img) #competition #topic-1 5 likes
(img) #competition #topic-2 3 likes
(img) #competition #topic-3 10 likes
(img) #competition #topic-4 5 likes
(img) #competition #topic-5 4 likes
Rank 2.
USERNAME - score 25
(img) #competition #topic-1 5 likes
(img) #competition #topic-2 3 likes
(img) #competition #topic-3 8 likes
(img) #competition #topic-4 5 likes
(img) #competition #topic-5 4 likes
Rank 3.
USERNAME - score 38 (has more likes than the leader but has only 4 topics covered..)
(img)#competition #topic-1 5 likes
(img) #competition #topic-2 3 likes
(img) #competition #topic-3 10 likes
(img) #competition #topic-4 20 likes
Rank 4.
USERNAME - score 17
(img) #competition #topic-1 1 likes
(img) #competition #topic-2 2 likes
(img) #competition #topic-3 3 likes
(img) #competition #topic-4 11 likes
And so on....
I have been poking around a little with the API and it seems like "/tags/tag-name/media/recent" would be my best, if not only "entry point" to this problem?
So what i'm thinking about doing is running a script each 5 minutes or something that till go trough the latest images tagged "#competition", then check if any of the 5 secondary tags are in there, and if so - save if not already in DB.
I guess i have to cache to fetch all images matching these tags over time? I have yet to reach Instagrams limit of objects per query... but if nothing else i will reach my own servers timeout if i try to load all each time.
The big pain in the ass from my point of view is the likes, since these need to be constantly updated from instagram to keep the scoreboard alive. Just looping trough all cached images with cron and then doing an api request to update each like count seems a bit heavy both for my server and instagrams api limit.
Maybe i can utilize logged in users sessions/tokens to do this in some smart way?
Or should i convince the rest of the team that this is a bad idea and we should build or own "voting" mechanism and keep the competition local, separated from instagrams like counters?
Please share your ideas of how you'd store and solve this :)