2020-12-08 18:17

API limits

We should implement API limits for unauthenticated requests, client-authenticated requests, and user-authenticated requests.

One thing that is different about pump.io from GNU Social is that pump.io requires OAuth client authentication for every endpoint. You can't request an outbox without an access token.

I did this largely because identi.ca gets pounded with stupid requests all day long. Searches for 10-year-old blog post URLs every 5 minutes. That kind of thing.

But disallowing unauthenticated requests is a real hassle for development, and just for reading things. I think for ActivityPub we want to allow unauthenticated requests.

We could provide incentive for developers to authenticate their requests using rate limits. Basically, if clients do nicer things, they can make more calls.

I suggest something like the following:

  • Unauthenticated requests have some high but not infinite limit per hour. 100K? 1M? Configurable? All unauthenticated requests draw from this pool.
  • Client-authenticated requests have their own limit. So getting a client key and using it gives you faster API performance.
  • User-authenticated requests have another level of limits. Each user gets its own limit.

We can apply limits by just slowing down the response rate -- inserting little pauses on the server. So you wouldn't have all the unauthenticated clients eat up the available API calls in the first minute after the hour; we'd only allow requests every 3600/N seconds. So, if the limit is 100,000/hour that's every 36ms -- pretty fast.

I think this gives a nice balance for site admins and developers.


  • 点赞
  • 写回答
  • 关注问题
  • 收藏
  • 复制链接分享
  • 邀请回答


  • weixin_39875754 weixin_39875754 4月前

    If I theoretically am authenticated to identi.ca over datamost (see https://github.com/pump-io/pump.io/issues/1565), would the API limit still apply for me? [and when will identi.ca be updated?]

    点赞 评论 复制链接分享
  • weixin_39808726 weixin_39808726 4月前

    Yes, I think the API limit would still apply. Here's a rough breakdown:

    Unauthenticated: shared pool of N1 API calls Client authenticated: each client has its own pool of N2 API calls User authenticated: each local or remote user has a pool of N3 API calls

    N1, N2, and N3 are configuration variables with defaults that make sense for a personal/family installation (say, a razpi-sized server or slice).

    点赞 评论 复制链接分享
  • weixin_39836063 weixin_39836063 4月前

    So, am I understanding correctly that potentially one bad actor could drain the pool of requests for the hour and then everyone else would starve? That seems unideal. Is there a reason we can't just say, each IP address gets a certain number of requests/hour? That's how GitHub does it. They set it fairly low, FWIW.

    点赞 评论 复制链接分享
  • weixin_39808726 weixin_39808726 4月前

    So, first, I'm not sure that's possible. If on the server side we delay by (remaining time)/(remaining calls), it would be hard to drain the pool without other players getting a chance.

    The only way I could see it working is if you made enough connections to do all the calls at the top of the hour -- but I don't think we can handle hundreds of thousands or millions of concurrent connections.

    And, of course, there's a remedy for anyone who doesn't like it, which is to get an OAuth client key. If we structure the limits correctly, there'd be a strong incentive to get a key.

    点赞 评论 复制链接分享