Rate limiting approach on our services

Abhishek Mishra
2 min readFeb 5, 2021

In my current organization and previous organization, we faced an issue whenever a sudden surge in traffic is there it impacted the services in cascading way. As sometimes HPA takes time to kick in or may not work in the right way if the desired condition is not being fulfilled.

To solve the above issue a lot of things came into mind that in some way how we should able to solve it. Popular work around it is rate limit or concurrency control. Both have their own pros and cons. For a more detailed definition take a look at the below link.

Rate limiting techniques

We picked concurrency control for our systems. I will emphasize more on the process part. In my previous company, I was able to implement a fixed rate limit approach on the application level. We used AtomicInteger and our framework as Dropwizard. In the current one, I am able to implement an adaptive rate limit approach using a sidecar.

Production Environment

We have EKS with HPA enabled based on CPU usage.

Things to look out for during implementation:

  • Access to production system metrics(i.e. number of requests, CPU, response time, etc.) for services
  • Handling of 429 (any desired) status code
  • How other services will retry/fail in case of the above status code.
  • Load testing scripts and tools to be ready.

Tools we used

  • openresty (tool written on Nginx)
  • Newrelic, Grafana Prometheus for observing metric production
  • Jmeter and blitz for load testing

Post Production things to look out for:

  • Making more configurable in terms of regional input, API endpoints, clients, etc
  • How many 429 status code is coming in your service
  • Timely update of latency properties(timeout, window size, concurrency limit)

Issues faced:

  • Memory issue as lua package is included.
  • Skeptical about the configuration of concurrency limit and measured response time.

Thanks, Vikas Kumar for the guidance and opportunity to work on this awesome project.

Important Links

Note:

The link given in the article covers most of the topics. So I am not making it redundant by including the same information in my blog. But do let me know if you guys think I should include the implementation part in this blog in that case I will update it.

References:-

--

--