How To Trace Memory Allocation in Python

Image by Author

When coding in Python, you don’t usually have to wrap your head around the details of memory allocation. But tracing memory allocation can be helpful, especially if you’re working with memory-intensive operations and large datasets.

Our Top 3 Course Recommendations

1. Google Cybersecurity Certificate – Get on the fast track to a career in cybersecurity.

2. Google Data Analytics Professional Certificate – Up your data analytics game

3. Google IT Support Professional Certificate – Support your organization in IT

Python’s built-in tracemalloc module comes with functions that’ll help you understand memory usage and debug applications. With tracemalloc, you can get where and how many blocks of memory have been allocated, take snapshots, compare differences between snapshots, and more.

We’ll look at some of these in this tutorial. Let’s get started.

Before You Begin

We’ll use a simple Python script for data processing. For this, we’ll create a sample dataset and process it. Besides a recent version of Python, you also need pandas and NumPy in your working environment.

Create a virtual environment and activate it:

$ python3 -m venv v1
$ source v1/bin/activate

And install the required libraries:

$ pip3 install numpy pandas

You can find the code for this tutorial on GitHub.

Create a Sample Dataset with Order Details

We’ll generate a sample CSV file with order details. You can run the following script to create a CSV file with 100K order records:

# create_data.py
import pandas as pd
import numpy as np

# Create a sample dataset with order details
num_orders = 100000
data = {
	'OrderID': np.arange(1, num_orders + 1),
	'CustomerID': np.random.randint(1000, 5000, num_orders),
	'OrderAmount': np.random.uniform(10.0, 1000.0, num_orders).round(2),
	'OrderDate': pd.date_range(start="2023-01-01", periods=num_orders, freq='min')
}

df = pd.DataFrame(data)
df.to_csv('order_data.csv', index=False)

This script populates a pandas dataframe with 100K records with the following four features, and exports the dataframe to a CSV file:

OrderID: Unique identifier for each order
CustomerID: ID for the customer
OrderAmount: The amount of each order
OrderDate: The date and time of the order

Trace Memory Allocation with tracemalloc

Now we’ll create a Python script to load and process the dataset. We’ll also trace memory allocations.

First, we define functions load_data and process_data to load and process records from the CSV file:

# main.py
import pandas as pd

def load_data(file_path):
    print("Loading data...")
    df = pd.read_csv(file_path)
    return df

def process_data(df):
    print("Processing data...")
    df['DiscountedAmount'] = df['OrderAmount'] * 0.9  # Apply a 10% discount
    df['OrderYear'] = pd.to_datetime(df['OrderDate']).dt.year  # Extract the order year
    return df

We can then go ahead with tracing memory allocation by doing the following:

Initialize the memory tracing with tracemalloc.start().
The load_data() function reads the CSV file into a dataframe. We take a snapshot of memory usage after this step.
The process_data() function adds two new columns to the dataframe: ‘DiscountedAmount’ and ‘OrderYear’. We take another snapshot after processing.
We compare the two snapshots to find memory usage differences and print out the top memory-consuming lines.
And then print the current and peak memory usage to understand the overall impact.

Here’s the corresponding code:

import tracemalloc

def main():
    # Start tracing memory allocations
    tracemalloc.start()

    # Load data
    df = load_data('order_data.csv')

    # Take a snapshot
    snapshot1 = tracemalloc.take_snapshot()

    # Process data
    df = process_data(df)

    # Take another snapshot
    snapshot2 = tracemalloc.take_snapshot()

    # Compare snapshots
    top_stats = snapshot2.compare_to(snapshot1, 'lineno')

    print("[ Top memory-consuming lines ]")
    for stat in top_stats[:10]:
        print(stat)

    # Current and peak memory usage
    current, peak = tracemalloc.get_traced_memory()
    print(f"Current memory usage: {current / 1024 / 1024:.1f} MB")
    print(f"Peak usage: {peak / 1024 / 1024:.1f} MB")

    tracemalloc.stop()

if __name__ == "__main__":
    main()

Now run the Python script:

This outputs the top memory-consuming lines as well as the current and peak memory usage:

Loading data...
Processing data...
[ Top 3 memory-consuming lines ]
/home/balapriya/trace_malloc/v1/lib/python3.11/site-packages/pandas/core/frame.py:12683: size=1172 KiB (+1172 KiB), count=4 (+4), average=293 KiB
/home/balapriya/trace_malloc/v1/lib/python3.11/site-packages/pandas/core/arrays/datetimelike.py:2354: size=781 KiB (+781 KiB), count=3 (+3), average=260 KiB
:123: size=34.6 KiB (+15.3 KiB), count=399 (+180), average=89 B
Current memory usage: 10.8 MB
Peak usage: 13.6 MB

Wrapping Up

Using tracemalloc to trace memory allocation helps identify memory-intensive operations and potentially optimize performance using the memory trace and statistics returned.

You should be able to see if you can use more efficient data structures and processing methods to minimize memory usage. For long-running applications, you can use tracemalloc periodically to track memory usage. That said, you can always use tracemalloc in conjunction with other profiling tools to get a comprehensive view of memory usage.

If you’re interested in learning memory profiling with memory-profiler, read Introduction to Memory Profiling in Python.

Bala Priya C is a developer and technical writer from India. She likes working at the intersection of math, programming, data science, and content creation. Her areas of interest and expertise include DevOps, data science, and natural language processing. She enjoys reading, writing, coding, and coffee! Currently, she’s working on learning and sharing her knowledge with the developer community by authoring tutorials, how-to guides, opinion pieces, and more. Bala also creates engaging resource overviews and coding tutorials.

How To Trace Memory Allocation in Python

Our Top 3 Course Recommendations

Before You Begin

Create a Sample Dataset with Order Details

Trace Memory Allocation with tracemalloc

Wrapping Up

Recent Articles

The Geospatial Capabilities of Microsoft Fabric and ESRI GeoAnalytics, Demonstrated

Pen Testing for Compliance Only? It’s Time to Change Your Approach

Boost 2-Bit LLM Accuracy with EoRA

9 Best Mattresses for Back Pain, Tested and Reviewed (2025)

A Step-by-Step Guide to Build an Automated Knowledge Graph Pipeline Using LangGraph and NetworkX

Related Stories

Leave A Reply Cancel reply