How-to on API rate limiting policies implementation

Tricks when you design or develop an app or when you got it ready and have to live with it

Intro
What is API rate limiting?
Types of rate limits
How to enforce API rate limiting
When should you use rate-limiting?
The bottom line

As an API-enabled app maintainer or developer, you have to ensure that your API runs as efficiently and predictable as possible to provide the best performance and security. API rate limiting is one of the easiest ways of controlling traffic to your APIs to avoid slow performance issues. It is a very critical part of API management because it also reinforces DoS \ DDoS security. In this case, DoS (Denial-of-service) servers can tank a database with unlimited API requests.

Rate limiting prevents API overuse from accidental problems within client code, which slams the API with requests not to mention customer data leaks through APIs. It also simplifies the scalability of your API. For example, when its popularity increases, there can be unprecedented high traffic that can slow down or take down the server. This guide informs you of everything you need to know about rate limiting and its practical application.

What is API rate limiting?

First and foremost, rate limiting is a safety integration that ensures the API's safety and its users. It protects your server against slow performance, DoS attacks, boosts scalability, and enhances the overall user experience. You can calculate rate limits in transactions per second or requests per second.

Suppose you want to permit your clients to call the API a specific number of times per minute, for example, 10times per minute. In this case, you will apply the rate limit as 10requests per 60seconds which means that the client can only call the API successfully up to 10times within a minute. Whenever the client exceeds that, they get an error stating that their rate limit has reached the maximum and cannot call for the eleventh time within that time interval. The server can send the response and the required request rate to the client and a header to show the amount of time they can wait before requesting again. Although it is optional, it is an excellent idea to let the user know the network's requirements.

On the other hand, if you provide unlimited access to your API, that means any user can access and use your API as much as they want at any given time. The downside of open access is that it diminishes the value of your API and limits business success. It can make it challenging to scale up your API and expose it to attacks and lagging issues. To prevent such a problem, you can integrate a limit on the quantity of data a user can consume at a given time, known as application rate limiting.

If you do not want the API to send back an error when a client reaches the maximum request limit, you can enforce throttling to queue the request instead. That is to say that the client's request will be executed as soon as the rate limit is not being exceeded. Simply put, there are two ways of handling requests once they exceed the rate limit. One is by sending back an error, and the other is by throttling the request for later execution. Throttling allows the user to use your API while still protecting it.

Types of rate limits

As a developer, you can implement any of the three types of rate limits. You can even use any combination depending on your needs and the importance you place on each type.

User rate limiting

The user rate limiting is the most common type that allows you to track a user's API key, IP address, and session cookie to watch the number of requests they make. If the user exceeds the required requests at a given time, they need to wait for the time frame to reset. The amount of time to wait may be sent via a message along with the 'retry after' header. In exceptional cases, a user can agree with the developer to increase their time limit to access the network without having to wait all the time.

Time rate limiting

The time-based rate limiting depends on the region or time the user attempts to access the network. It ensures that the strict rate-limiting protocols are active on specific periods when the traffic is the highest. For example, you can increase the number of requests allowed at a given time from 12 am to 8 am because traffic is usually the lowest during such periods.

Server rate limiting

Based on the size of your API, you may have multiple servers monitoring and facilitating various requests. Serve rate limiting means enforcing different limits based on a server to server. For example, server rate limiting can be implemented on an image processing application that uses many CPU cycles. In this case, you can limit the server facilitating the processing at higher levels than the normal webserver to throttle API requests faster to be fair to all users. Server rate limiting is beneficial in that it reduces the request limits for less accessed servers to free up the available network to the server that generates more API requests.

How to enforce API rate limiting

Many applications that use REST APIs use rate-limiting for safety against overloaded servers and Dos attacks. Other APIs implement soft limits that may allow the user to extend the limit for a short while. Some execute a hardline approach immediately, sending an HTTP 429 error or timing out, which means that the user needs to use a new query.

There are various ways in which you can rate limit your API. Check them out below:

Request queues

Every programming language has its demands for request queue libraries which means that your work as a developer is simplified. You can even find queue library directories and search for pre-written codes. There are very many request queues out there awaiting use. For example, one of them enforces the rate limit at two requests per second and sends the rest to a request queue. Let us look at some of them:

Amazon simple queue service- this is a ready-made request queue library that facilitates requests and messaging queues. The good thing with this request queue is that it is maintained regularly, so you do not have to constantly debug your software or hardware to make it work efficiently.
Android volley- this is a common request queue library for android developers but not all android libraries can use it. So, be sure to look at your specific android library's documentation to determine if it is compatible with a volley.

Throttling

Throttling is one of the ways you can enforce API rate limiting. It allows you to monitor how your API is being used by establishing a temporary state which enables the API to assess every request. Whenever the API is throttled, the user may have to cope with low bandwidth or be disconnected altogether. Throttling is a common way of rate-limiting APIs possible at the user level or application. As such, there are numerous ready-made commercial products in the market for developers.

Rate limiting algorithms

You can use algorithms as a way to implement an easily scalable rate-limited API. Just like request queue libraries and throttling services, there are multiple rate-limiting algorithms available in the market. Let us look at some of them.

Leaky bucket- this translates requests in a first-in, first-out (FIFO) format to process the items on the queue at a regular rate. As such, it is easy to enforce on a single server or load balancer and reduces the issue of high traffic. With a limited queue size, it is small and memory efficient.
Fixed window- these algorithms use a fixed rate to monitor the rate of requests using an incremental counter. The developer defines the window for a specific number of seconds at a set time, for example, 3600 per hour. In this case, whenever the incremental counter exceeds the maximum limit for the specified period, it discards the additional requests. You can use fixed window algorithms if you do not want your API to be overwhelmed with old requests. Still, it could be overloaded when using this method. That is because if a series of requests are made when the window refreshes, your API can still be stampeded.
Sliding log- this algorithm uses a time-stamped log to track every request. Whenever a new request enters, the sum of the logs is calculated to identify the request rate. If the request surpasses the limit threshold, they are queued. Again, log with time-stamps that also reach the rate limit are discarded. Unlike fixed windows, sliding log algorithms do not encounter stampeding issues. Its downside is that it may not be the most suitable option to prevent DoS attacks, overload, or implement a scalable API.
Sliding window- this type of algorithm blends the best of sliding log and fixed window algorithms. It uses a cumulative counter for a specific set time, just like the fixed window algorithm. It also evaluates the previous window to reduce outbursts of traffic. The sliding window only requires a small number of data points to assess each request making it the most suitable option for processing massive amounts of requests while running fast and smoothly.

Dedicated API rate limiting tools

Sometimes we got applications without built-in API rate limiting checks, controls and policies and need some workaround. There are several kinds of tools that can be API rate limiting policies enforcers:

CDN - consider using a content delivery network serving as a proxy to manage access to your APIs. An example of such is CloudFlare.
API Gateways - you can also use API Gateways (like Kong) to set API quotas and enforce rate-limiting policies.
Anti-DDoS services (L7 layer) - some anti-DDoS providers tackle API-based DDoS pretty well. An example of such is QRator.
Reverse proxies and application gateways - if you already use NGINX you can use it to enforce your API controls as well.

When should you use rate-limiting?

Key level limiting allows you to limit the traffic from all the individual sources to ensure every user sticks to their prescribed limit. This is very valuable for APIs associated with monetization schemes whereby you can allow many requests per second depending on the tier the user is paying for or their subscription.

API level rate limiting allows you to monitor all traffic accessing your API from different sources to ensure that they don't surpass the overall rate limit. You can use it as your additional layer of safety against attempted denial of service attacks.

There are many scenarios whereby you might want to apply rate-limiting for the efficiency of your API. The specific type of rate limiting to use depends on several factors. But you can simplify that by asking yourself the following questions.

Do you want to limit the number of requests a user can make to specific APIs they can access?
Do you want to limit the number of requests a user can make to all the APIs they can access?
Do you want defence against overwhelming traffic from all the API users or against denial of service attacks?

The bottom line

Ensure you take the time to implement an informed strategy for enforcing rate limits. Consider rate-limiting techniques that improve security, enhance performance and efficiency across your APIs stack.

If you for any reason need professional advice on how to protect your API - especially if you have to make it public (as Fintechs under PSD2 directive - contact us for professional assistance).

Alex Bodryk, Certified Information Systems Auditor

Cyberlands, Co-founder & managing director