Image by Author
Â
When working on data science projects, you often deal with tabular data (organized in rows and columns). This data isn’t always in perfect form, so you need to perform various analyses and transformations for which the Pandas library is commonly used. This tabular data is referred to as a DataFrame in Pandas.
Sometimes, instead of row-wise or column-wise analysis, you want to perform an operation on all elements of the data, known as an element-wise operation. These operations may include, but are not limited to, cleaning, normalizing, scaling, encoding, or transforming the data to the right form. This article will go through different examples to see how you can utilize the DataFrame.map()
function for different data preprocessing tasks.
Before proceeding further, please note that in the previous versions of Pandas, applymap()
was the go-to method for element-wise operations on Pandas DataFrames. However, this method has been deprecated and renamed to DataFrame.map()
from version 2.1.0 onwards.
Â
Overview of DataFrame.map()
Function
Â
Let’s take a look at its syntax [source]:
DataFrame.map(func, na_action=None, kwargs)
Â
The syntax is very simple. It just takes the function as an argument and applies it to each element of the DataFrame. The output is the transformed DataFrame with the same shape as the input.
Here,
- na_action: It can take either the value of ‘ignore’ or None (default). By setting
na_action='ignore'
, you can skip over NaN values instead of passing them through the mapping function. - **kwargs: It allows you to pass additional arguments to the mapping function.
Now that we have a basic understanding of the syntax, let’s move on to some practical examples of using DataFrame.map()
for element-wise operations in Pandas.
Â
1. Applying Custom Functions
Â
Custom functions are user-defined functions that perform operations not pre-defined in the library. For example, if your DataFrame contains daily temperatures in Fahrenheit but you want to convert them to Celsius for your analysis, you can pass each element of the DataFrame through a conversion operation. Since this conversion isn’t already available in Pandas, you need to define the function yourself. Let’s take a look at an example to see how it works.
import pandas as pd
# Sample dataframe with daily temperatures in Fahrenheit
df = pd.DataFrame({'temp_F': [85, 75, 80, 95, 90]})
# Custom function to convert temperature from Fahrenheit to Celsius
def convert_F_to_C(temp_F):
return round((temp_F - 32) * 5/9, 2)
# Apply the custom function to the dataframe using the map() function
df['temp_C'] = df['temp_F'].map(convert_F_to_C)
# Print the final dataframe
print(df)
Â
Output:
temp_F temp_C
0 85 29.44
1 75 23.89
2 80 26.67
3 95 35.00
4 90 32.22
Â
2. Working with Dictionaries
 DataFrame.map()
also works smoothly with dictionaries. This is particularly useful when you want to convert numerical values in your DataFrame to categories based on some criteria. Let’s take an example of converting student marks to letter grades using a dictionary.
import pandas as pd
# Sample DataFrame with numerical grades
grades = {'Student': ['Qasim', 'Babar', 'Sonia'], 'Grade': [90, 85, 88]}
df = pd.DataFrame(grades)
# Dictionary to map numerical grades to grades
grade_map = {90: 'A', 85: 'B', 88: 'B+'}
# Applying the dictionary mapping to the DataFrame
df['Letter_Grade'] = df['Grade'].map(grade_map)
print(df)
Â
Output:
Student Grade Letter_Grade
0 Qasim 90 A
1 Babar 85 B
2 Sonia 88 B+
Â
3. Handling Missing Values
Â
Handling missing values is crucial in data preprocessing. These missing values are typically denoted as NaN
(Not a Number). As a responsible scientist, it is essential to handle these missing values effectively, as they can significantly impact your analysis. You can impute them with meaningful alternatives. For instance, if you are calculating the average BMI of a class and encounter a student whose weight is available but whose height is missing, instead of leaving it blank, you can substitute it with the average height of students in the same grade, thereby preserving the data point.
Recall the syntax of dataframe.map()
I showed earlier, which includes the na_action parameter. This parameter allows you to control how missing values are handled. Let me help you understand this with an example.
Suppose we are running a grocery store and some prices are missing. In this case, we want to display “Unavailable” instead of NaN
. You can do so as follows;
import pandas as pd
import numpy as np
# Sample DataFrame of Grocery Store with some NaN values for price
df = pd.DataFrame({
'Product': ['Apple', 'Banana', 'Cherry', 'Date'],
'Price': [1.2, np.nan, 2.5, np.nan]
})
# Mapping function that formats the prices and handles missing values
def map_func(x):
if pd.isna(x):
return 'Unavailable'
else:
return f'${x:.2f}'
# With default na_action=None
df['Price_mapped_default'] = df['Price'].map(map_func)
# With na_action='ignore'
df['Price_mapped_ignore'] = df['Price'].map(map_func, na_action='ignore')
# Print the resulting DataFrame
print(df)
Â
Output:
Product Price Price_mapped_default Price_mapped_ignore
0 Apple 1.2 $1.20 $1.20
1 Banana NaN Unavailable NaN
2 Cherry 2.5 $2.50 $2.50
3 Date NaN Unavailable NaN
Â
You can see that when na_action='ignore'
is used, the NaN
values are not passed through the custom function, resulting in NaN
values in the resulting column. On the other hand, when na_action=None
is used (or not specified), the NaN
values are passed through the custom function, which returns ‘Unavailable’ in this case.
Â
4. Chaining dataframe.map()
Â
Another standout feature of dataframe.map()
is the ability to chain multiple operations together in a single call. This allows you to perform complex transformations by dividing them into smaller, more manageable subparts. Not only does this make your code easier to understand, but it also enables you to streamline the process of applying transformations sequentially.
Let’s consider an example where we chain operations to preprocess a dataset containing sales data. Assume we want to format prices, calculate taxes, and apply discounts in a single transformation chain:
import pandas as pd
# DataFrame representing sales data
sales_data = pd.DataFrame({
'Product': ['Apple', 'Banana', 'Cherry'],
'Price': ["1.2", "0.8", "2.5"]
})
# Functions for each transformation step
def format_price(price):
return float(price)
def calculate_tax(price):
tax_rate = 0.1
return price * (1 + tax_rate)
def apply_discount(price):
discount_rate = 0.2
return price * (1 - discount_rate)
# Chain transformations using dataframe.map()
sales_data['Formatted_Price'] = sales_data['Price'].map(format_price).map(calculate_tax).map(apply_discount)
# Print the resulting DataFrame
print(sales_data)
Â
Output:
Product Price Formatted_Price
0 Apple 1.2 1.056
1 Banana 0.8 0.704
2 Cherry 2.5 2.200
Â
The dataframe.map()
function executes these transformations sequentially from left to right. In this example, it begins by formatting each price to a float using format_price()
. Next, it calculates the tax for each formatted price using calculate_tax()
, and finally, it applies a discount using apply_discount()
. This chaining ensures that each transformation is applied in order, building upon the previous one to produce the desired processed values in the Formatted_Price
column of the sales_data
DataFrame.
Â
Wrapping Up
Â
That wraps up today’s article! If you have any other important use cases or examples where you apply the dataframe.map()
function, feel free to share them in the comments. Your experiences can help us all learn and explore more together. For further exploration, here’s the official documentation link. This article is part of the Pandas Series. If you enjoyed this content, you may also find other relevant articles in my author profile worth checking out.
Â
Â
Kanwal Mehreen Kanwal is a machine learning engineer and a technical writer with a profound passion for data science and the intersection of AI with medicine. She co-authored the ebook “Maximizing Productivity with ChatGPT”. As a Google Generation Scholar 2022 for APAC, she champions diversity and academic excellence. She’s also recognized as a Teradata Diversity in Tech Scholar, Mitacs Globalink Research Scholar, and Harvard WeCode Scholar. Kanwal is an ardent advocate for change, having founded FEMCodes to empower women in STEM fields.