Image by Author | Midjourney
Â
Time-based data can be unique when we face different time-zones. However, interpreting timestamps can be hard because of these differences. This guide will help you manage time zones and timestamps with the Pandas library in Python.
Â
Preparation
Â
In this tutorial, we’ll use the Pandas package. We can install the package using the following code.
Â
Now, we’ll explore how to work with time-based data in Pandas with practical examples.
Â
Handling Time Zones and Timestamps with Pandas
Â
Time data is a unique dataset that provides a time-specific reference for events. The most accurate time data is the timestamp, which contains detailed information about time from year to millisecond.
Let’s start by creating a sample dataset.
import pandas as pd
data =
'transaction_id': [1, 2, 3],
'timestamp': ['2023-06-15 12:00:05', '2024-04-15 15:20:02', '2024-06-15 21:17:43'],
'amount': [100, 200, 150]
df = pd.DataFrame(data)
df['timestamp'] = pd.to_datetime(df['timestamp'])
Â
The ‘timestamp’ column in the example above contains time data with second-level precision. To convert this column to a datetime format, we should use the pd.to_datetime
function.”
Afterward, we can make the datetime data timezone-aware. For example, we can convert the data to Coordinated Universal Time (UTC)
df['timestamp_utc'] = df['timestamp'].dt.tz_localize('UTC')
print(df)
Â
Output>>
transaction_id timestamp amount timestamp_utc
0 1 2023-06-15 12:00:05 100 2023-06-15 12:00:05+00:00
1 2 2024-04-15 15:20:02 200 2024-04-15 15:20:02+00:00
2 3 2024-06-15 21:17:43 150 2024-06-15 21:17:43+00:00
Â
The ‘timestamp_utc’ values contain much information, including the time-zone. We can convert the existing time-zone to another one. For example, I used the UTC column and changed it to the Japan Timezone.
df['timestamp_japan'] = df['timestamp_utc'].dt.tz_convert('Asia/Tokyo')
print(df)
Â
Output>>>
transaction_id timestamp amount timestamp_utc \
0 1 2023-06-15 12:00:05 100 2023-06-15 12:00:05+00:00
1 2 2024-04-15 15:20:02 200 2024-04-15 15:20:02+00:00
2 3 2024-06-15 21:17:43 150 2024-06-15 21:17:43+00:00
timestamp_japan
0 2023-06-15 21:00:05+09:00
1 2024-04-16 00:20:02+09:00
2 2024-06-16 06:17:43+09:00
Â
We could filter the data according to a particular time-zone with this new time-zone. For example, we can filter the data using Japan time.
start_time_japan = pd.Timestamp('2024-06-15 06:00:00', tz='Asia/Tokyo')
end_time_japan = pd.Timestamp('2024-06-16 07:59:59', tz='Asia/Tokyo')
filtered_df = df[(df['timestamp_japan'] >= start_time_japan) & (df['timestamp_japan'] <= end_time_japan)]
print(filtered_df)
Â
Output>>>
transaction_id timestamp amount timestamp_utc \
2 3 2024-06-15 21:17:43 150 2024-06-15 21:17:43+00:00
timestamp_japan
2 2024-06-16 06:17:43+09:00
Â
Working with time-series data would allow us to perform time-series resampling. Let’s look at an example of data resampling hourly for each column in our dataset.
resampled_df = df.set_index('timestamp_japan').resample('H').count()
Â
Leverage Pandas’ time-zone data and timestamps to take full advantage of its features.
Â
Additional Resources
Â
Â
Â
Cornellius Yudha Wijaya is a data science assistant manager and data writer. While working full-time at Allianz Indonesia, he loves to share Python and data tips via social media and writing media. Cornellius writes on a variety of AI and machine learning topics.