Hello there 👋🏻
Maintainer from the Shields.io team here, we provide badges showing data from PyPI Stats, as advertised on your own website.
Our PyPi badges have become quite popular, typically in the order of a million badges served on a weekday. Unfortunately, this means we're fighting up against the PyPI Stats rate limiting which is pretty low, 5 per second and 30 per minute (source). Users have been hit by these rate limits, for example in badges/shields#11620. Our 70 daily tests, which validate integration with PyPI Stats end-to-end, also fail on a regular basis.
I've significantly increased caching on our side, but ultimately we can't control user behaviours. If a Shields.io server gets 6 incoming requests at the same time for 6 different PyPI badges that were not cached, one of them will fail. Statistically-speaking there will be failures no matter what caching we put in place. The fact that rate limiting is IP-based (source) is also problematic for us. I'd like to scale down our infra to reduce our costs (we're an open-source project not backed by any big entity, as you'd imagine, we're not running on a pot of gold), but doing so would mean that the requests for non-cached badges will be spread across less IPs; rejected requests from PyPI Stats will mechanically go up.
Many upstream services we integrate with allow for much higher API rate limits via an authenticated token. Is this something that could be put in place on your side? Are there other solutions that could be considered?
Hello there 👋🏻
Maintainer from the Shields.io team here, we provide badges showing data from PyPI Stats, as advertised on your own website.
Our PyPi badges have become quite popular, typically in the order of a million badges served on a weekday. Unfortunately, this means we're fighting up against the PyPI Stats rate limiting which is pretty low, 5 per second and 30 per minute (source). Users have been hit by these rate limits, for example in badges/shields#11620. Our 70 daily tests, which validate integration with PyPI Stats end-to-end, also fail on a regular basis.
I've significantly increased caching on our side, but ultimately we can't control user behaviours. If a Shields.io server gets 6 incoming requests at the same time for 6 different PyPI badges that were not cached, one of them will fail. Statistically-speaking there will be failures no matter what caching we put in place. The fact that rate limiting is IP-based (source) is also problematic for us. I'd like to scale down our infra to reduce our costs (we're an open-source project not backed by any big entity, as you'd imagine, we're not running on a pot of gold), but doing so would mean that the requests for non-cached badges will be spread across less IPs; rejected requests from PyPI Stats will mechanically go up.
Many upstream services we integrate with allow for much higher API rate limits via an authenticated token. Is this something that could be put in place on your side? Are there other solutions that could be considered?