FireDucks: An Accelerated Fully Compatible Pandas Library



Image by Author | Ideogram

 

Pandas is a library for data manipulation that is used by many data people who use Python. It’s a standard that many professionals have been taught to use since the beginning of their data science careers.

Although Pandas is easy to use, it can sometimes be slow. The larger the dataset and the more complex the analysis, the slower Pandas will run. Many frameworks have been developed as alternatives to Pandas, but most of them use their systems rather than building on Pandas.

That’s why, FireDucks showed up as an enhancement to Pandas to accelerate the process instead of replacing them.

So, how does FireDucks work? Let’s explore it together.
 

FireDucks Introduction

 
FireDucks is a Python library that works as a Pandas accelerator, instead of replacing it completely. It’s intended to work by using Pandas as the base and improve the execution speed for any Pandas APIs we are using.

The way that FireDucks accelerates the Panda’s execution is via two methods: Compiler Optimization and Multithreading.

The optimization compiler works by converting the Python program into an intermediate language before execution. The conversion allows the program to execute faster without changing the program output. The intermediate language used in the FireDucks is something that is designed specifically for DataFrames and it means that the optimization works well to improve the Panda execution times.

FireDucks also accelerates the process by using multithreading on the backend. By multithreading, it means that FireDucks can utilize CPU multiple cores to make things faster similar to how GPU improve the computational speed.

Additionally, FireDucks executes the process via the lazy execution model. The lazy execution model is a batch processing and only executed when the results are needed. With lazy execution, the FireDucks main methods do not process the DataFrames but use the intermediate language used by the compiler previously. When the result is required, all the previously generated intermediate language is executed simultaneously.

That’s a simple introduction to how FireDucks improves the execution speed. Let’s try it out with the actual Python code.

 

Code Implementation

 
To start, let’s install the library using the pip. You can do that via the code below.

 

There are two ways to implement the FireDucks in the Pandas library: Hook or Explicit import.

Using Hook, we only need to enable the FireDucks without importing them. We can do that using the following code.

%load_ext fireducks.pandas
import pandas as pd

 

By using the Hook, we can easily replace the Pandas with FireDucks without changing any of the APIs within.

If you want to change Pandas, then we need to explicitly import the library. You can do that using the following code.

import fireducks.pandas as pd

 

With the library installed, let’s try to compare the FireDucks with the Pandas library. You will see that FireDucks is significantly faster but still uses the same APIs.

For example, we can generate sample data and compare both library capabilities in sort out the values.

import time
import numpy as np
import pandas as pd        
import fireducks.pandas as fpd  

n = 1_000_000
np.random.seed(42)
data = 
    "x": np.random.randint(0, 100, n),
    "y": np.random.rand(n)


df_pandas = pd.DataFrame(data)
df_fireducks = fpd.DataFrame(data)

start_pd = time.time()
sorted_pd = df_pandas.sort_values("x")
time_pd = time.time() - start_pd

start_fd = time.time()
sorted_fd = df_fireducks.sort_values("x")
time_fd = time.time() - start_fd

print("Pandas sort time: :.4f sec".format(time_pd))
print("FireDucks sort time: :.4f sec".format(time_fd))

 

The result is shown below.

Pandas sort time: 0.0009 sec
FireDucks sort time: 0.0004 sec

 

You can see how fast the FireDucks are compared to the Pandas library. It might not seem that much difference, but you will see the difference in speed much more with larger datasets and complex execution.

That’s all you need to know about FireDucks. Try to use them when you feel that Pandas is too slow.
 

Conclusion

 
FireDucks is a Python library that is designed to accelerate Pandas’ operation without switching to the new framework. By using compiler optimization and multithreading, FireDucks can significantly improve the execution performance.

The library is easy to use as you don’t need to change all the APIs you already have. FireDucks is especially useful if you have a larger dataset and complex execution that might take too much time to process.

I hope this has helped!
 
 

Cornellius Yudha Wijaya is a data science assistant manager and data writer. While working full-time at Allianz Indonesia, he loves to share Python and data tips via social media and writing media. Cornellius writes on a variety of AI and machine learning topics.

Recent Articles

Related Stories

Leave A Reply

Please enter your comment!
Please enter your name here