dongmu5920 2013-01-16 09:36
浏览 49

Instagram API,需要输入:随着时间的推移从特定主题标签抓取instagram图像并同步最新,如本地排序查询的计数

Hello there and warning for wall of text :)

I am about to build a site that scrape and collects instagram photos belonging to five combos of two hashtags. The first hashtag will be the same and are the name of the site/campaign, the other hashtag till be one of five topics.

This also need to sort under the instagram username so that each user can "collect" images of all five topics.

This then needs to be presented as a "toplist" sorted by "number of images DESC, combined likes DESC". Where one image from each topic, in other words, five images are max for each user.

Kind of hard to explain, i'll try to illustrate it by this exampel of the toplist i need to build:

TOPLIST:

Rank 1.

USERNAME - score 27 (has collected all 5 topics and have most combined likes)

(img) #competition #topic-1 5 likes

(img) #competition #topic-2 3 likes

(img) #competition #topic-3 10 likes

(img) #competition #topic-4 5 likes

(img) #competition #topic-5 4 likes

Rank 2.

 USERNAME - score 25

(img) #competition #topic-1 5 likes 

(img) #competition #topic-2 3 likes

(img) #competition #topic-3 8 likes

(img) #competition #topic-4 5 likes

(img) #competition #topic-5 4 likes

Rank 3.

USERNAME - score 38 (has more likes than the leader but has only 4 topics covered..)

(img)#competition #topic-1 5 likes

(img) #competition #topic-2 3 likes

(img) #competition #topic-3 10 likes

(img) #competition #topic-4 20 likes

Rank 4.

USERNAME - score 17
(img) #competition #topic-1 1 likes

(img) #competition #topic-2 2 likes 

(img) #competition #topic-3 3 likes 

(img) #competition #topic-4 11 likes

And so on....

I have been poking around a little with the API and it seems like "/tags/tag-name/media/recent" would be my best, if not only "entry point" to this problem?

So what i'm thinking about doing is running a script each 5 minutes or something that till go trough the latest images tagged "#competition", then check if any of the 5 secondary tags are in there, and if so - save if not already in DB.

I guess i have to cache to fetch all images matching these tags over time? I have yet to reach Instagrams limit of objects per query... but if nothing else i will reach my own servers timeout if i try to load all each time.

The big pain in the ass from my point of view is the likes, since these need to be constantly updated from instagram to keep the scoreboard alive. Just looping trough all cached images with cron and then doing an api request to update each like count seems a bit heavy both for my server and instagrams api limit.

Maybe i can utilize logged in users sessions/tokens to do this in some smart way?

Or should i convince the rest of the team that this is a bad idea and we should build or own "voting" mechanism and keep the competition local, separated from instagrams like counters?

Please share your ideas of how you'd store and solve this :)

  • 写回答

1条回答 默认 最新

  • douhuifen9942 2013-04-17 22:39
    关注

    I think the tags endpoint, like you suggested, is the way to go - this will return all the data you need - and store in a database. That way you can do all your calculations (aggregation of users, likes, etc.) and don't have to worry too much about rate limits, authentication etc.

    Unfortunately, I don't think real-time update get's triggered on a like (which would be ideal for you) so your stuck with having to either go through all the images to get the like count or create you own like mechanism like you mentioned - not sure what your requirements are and how many users / pics you're expecting but if you spread it out over time (i.e. get x amount every x minutes) I don't see any problem server wise...

    When you mention caching of images I'm assuming you mean storing the url and not the actual binary image data?

    评论

报告相同问题?

悬赏问题

  • ¥15 关于#matlab#的问题:在模糊控制器中选出线路信息,在simulink中根据线路信息生成速度时间目标曲线(初速度为20m/s,15秒后减为0的速度时间图像)我想问线路信息是什么
  • ¥15 banner广告展示设置多少时间不怎么会消耗用户价值
  • ¥16 mybatis的代理对象无法通过@Autowired装填
  • ¥15 可见光定位matlab仿真
  • ¥15 arduino 四自由度机械臂
  • ¥15 wordpress 产品图片 GIF 没法显示
  • ¥15 求三国群英传pl国战时间的修改方法
  • ¥15 matlab代码代写,需写出详细代码,代价私
  • ¥15 ROS系统搭建请教(跨境电商用途)
  • ¥15 AIC3204的示例代码有吗,想用AIC3204测量血氧,找不到相关的代码。