Effectiveness of local caching in a distributed environment

By Narayan Periwal

Narayan Periwal

Local caching is not as bad as you may think it is in distributed systems

Starting with an empty glass of water, if we fill the glass by 20% of the available space at each step, then the amount of water left in the glass at each step follows a geometric progression.

This holds true for any percentage that we choose.

We have all used caching

We have an online machine learning serving system that serves thousands of requests per second running on a Kubernetes cluster. In order to reduce the network latency and the cost of resources used, we decided to use caching.

We decided to use local caching instead of a single distributed cache to avoid the extra network hop.

System constraints

  • We have a distributed cluster of 5 nodes
  • Each node have a very large-sized local cache, and we cache all the request-response
  • The incoming request gets distributed uniformly across the nodes
  • Each request gets repeated periodically
  • In our system, the periodicity is 1 hour, which means all the requests are getting repeated every hour

We want to achieve the cache hit rate of 0.95, as we expect repetitive traffic in our system and we cache every request-response. If we had a cluster of just 1 node, it would have taken only 1 hour since that is the periodicity of requests. Hence extrapolating from this, the cluster of 5 nodes should have taken a bit more than 5 hrs.

To our surprise, it took us around 14 hours to achieve the required cache hit rate. Our first guess was we have messed up something in the code. However, we could not find anything wrong with it.

We tried to understand the mathematics behind this and guess what, we found some interesting patterns.

Let’s get into the Math

As stated before, the periodicity of the request in our system is 1 hour.

Each node receives 20% of the requests every hour since we have a cluster of size 5. This means the number of cache misses in each node becomes 80% of the cache miss of the previous hour. Hence the cache miss follows a geometric progression with a common ratio of 0.8

To understand this in simpler terms, let’s take an example where there are 10000 unique requests that are getting repeated every hour. Because there are 5 nodes in the system, each node will get 2000 requests.

In the first hour, because the cache is empty to start with, all the requests will result in cache misses causing the cache to get filled with new records

Hence at the end of 1st hour, cache miss count = 2000 and cache hit count = 0.

Now each node has 2000 records in its cache.

In the 2nd hour, another 2000 requests will come to each node. However, the situation is different this time around. Because the cache now already has some records in them, there will be cache hits. Assuming a random distribution of requests in both this and last hour, 400 out of 2000 existing records would overlap in the current hour.

Hence at the end of 2nd hour, cache miss count = 1600 and cache hit count = 400.

Therefore, this time only 1600 new records get added to the cache.

The same follows for consecutive hours as well:-

At the end of 3rd hour, cache miss count = 1280 and cache hit count = 720

At the end of 4th hour, cache miss count = 1024 and cache hit count = 976

And so on…

The cache miss count sequence is 2000, 1600, 1280, 1024, …..

We can see there the cache miss count for an hour is always 80% of the cache miss count of the previous hour.

It is a geometric sequence with the common ratio 1600/2000 = 0.8

Following the glass of water analogy given at the beginning,

If the cache fills up every hour by a fixed percentage of its remaining size, then the cache miss rate at the end of every hour follows a geometric progression

Now, let’s see how it took 14 hours to reach a cache hit rate of 0.95 in a cluster of 5 nodes.


For a cluster of n nodes, the common ratio r of the geometric progression for cache miss rate is

Thus, the cache miss rate m for a distributed cluster of n nodes and t units of time is

Hence, the cache hit rate h for a distributed cluster of n nodes and t units of time can be represented as

n → number of node in the distributed cluster, n > 2

h → cache hit rate, 0 < h < 1

t → the number of time unit elapsed, t > 1

Note: A time unit represents periodicity of requests

Another interesting observation is, as we increase the number of nodes in the cluster, the time taken to reach the desired cache hit rate increases almost linearly.

Although this is not clearly evident from the equation, it can be observed from the contour graph on the right-hand side, where each line in the graph represents a single h value

From the above equation, the time taken t to arrive at the desired cache hit rate h in a distributed cluster of n node can be expressed as:

n → the number of nodes in the distributed cluster, n > 2

h → cache hit rate, 0 < h < 1

t → the number of time unit elapsed, t > 1

Note: A time unit represents periodicity of requests


  • The effectiveness of local caching depends on how frequently the requests get repeated
  • The time taken to reach a particular cache hit rate is proportional to the number of nodes in the distributed cluster
  • If the time required to reach the desired hit rate is very small compared to the cache eviction/invalidation time, then local caching is helpful, else we should go for distributed caching
  • If only x% of requests repeats periodically, then we can replace h by x% of h