Image by Editor | Midjourney & Canva
Â
Let’s learn how to use MultiIndex in Pandas for hierarchical data.
Â
Preparation
Â
We would need the Pandas package to ensure it is installed. You can install them using the following code:
Â
Then, let’s learn how to handle MultiIndex data in the Pandas.
Â
Using MultiIndex in Pandas
Â
MultiIndex in Pandas refers to indexing multiple levels on the DataFrame or Series. The process is helpful if we work with higher-dimensional data in a 2D tabular structure. With MultiIndex, we can index data with multiple keys and organize them better. Let’s use a dataset example to understand them better.
import pandas as pd
index = pd.MultiIndex.from_tuples(
[('A', 1), ('A', 2), ('B', 1), ('B', 2)],
names=['Category', 'Number']
)
df = pd.DataFrame(
'Value': [10, 20, 30, 40]
, index=index)
print(df)
Â
The output:
Value
Category Number
A 1 10
2 20
B 1 30
2 40
Â
As you can see, the DataFrame above has a two-level Index with the Category and Number as their index.
It’s also possible to set the MultiIndex with the existing columns in our DataFrame.
data =
'Category': ['A', 'A', 'B', 'B'],
'Number': [1, 2, 1, 2],
'Value': [10, 20, 30, 40]
df = pd.DataFrame(data)
df.set_index(['Category', 'Number'], inplace=True)
print(df)
Â
The output:
Value
Category Number
A 1 10
2 20
B 1 30
2 40
Â
Even with different methods, we have similar results. That’s how we can have the MultiIndex in our DataFrame.
If you already have the MultiIndex DataFrame, it’s possible to swap the level with the following code.
Â
The output:
Value
Number Category
1 A 10
2 A 20
1 B 30
2 B 40
Â
Of course, we can return the MultiIndex to columns with the following code:
Â
The output:
Category Number Value
0 A 1 10
1 A 2 20
2 B 1 30
3 B 2 40
Â
So, how to access MultiIndex data in Pandas DataFrame? We can use the .loc
method for that. For example, we access the first level of the MultiIndex DataFrame.
Â
The output:
Â
We can access the data value as well with Tuple.
Â
The output:
Value 10
Name: (A, 1), dtype: int64
Â
Lastly, we can perform statistical aggregation with MultiIndex using the .groupby
method.
print(df.groupby(level=['Category']).sum())
Â
The output:
Â
Mastering the MultiIndex in Pandas would allow you to gain insight into hierarchal data.
Â
Additional Resources
Â
Â
Â
Cornellius Yudha Wijaya is a data science assistant manager and data writer. While working full-time at Allianz Indonesia, he loves to share Python and data tips via social media and writing media. Cornellius writes on a variety of AI and machine learning topics.