How to implement a caching mechanism in Python, and when not to use it?
When talking about improving Python execution performance, especially for data processing, there are too many 3rd party libraries that can help us. If we think about their mechanisms, most of them rely on optimising the data structure or utilisation of the memory to achieve performance improvement.
For example, Dask leverages parallel computing and memory optimisation, Pandas relies on vectorisation of the dataset and Modlin optimises the utilisation of the multi-cores CPU and memory, too.
In this article, I won’t introduce any libraries. In fact, there is a native Python decoration that could be used to improve the performance significantly. We don’t need to install anything because it’s built-in to Python. Of course, it won’t be used for all the scenarios. So, in the last section, I’ll also discuss when we should not use it.
Let’s start with a vanilla example that we are all familiar with, the Fibonacci sequence. Below is a normal implementation using recursion.
def fibonacci(n):
if n < 2:
return n
return fibonacci(n-1) + fibonacci(n-2)
Like most other programming languages, Python also needs to build a “stack” for a recursive function and calculate the value on every stack.
However, the “cache” decoration will improve the performance significantly. Also, it is not difficult to do so. We just need to import it from the functools
module, and then add the decoration to the function.
from functools import cache@cache
def fibonacci_cached(n):
if n < 2:
return n
return fibonacci_cached(n-1) + fibonacci_cached(n-2)
Here are the running results and the performance comparison.