How To Trace Memory Allocation in Python


How To Trace Memory Allocation in Python
Image by Author

 

When coding in Python, you don’t usually have to wrap your head around the details of memory allocation. But tracing memory allocation can be helpful, especially if you’re working with memory-intensive operations and large datasets.

Our Top 3 Course Recommendations

1. Google Cybersecurity Certificate – Get on the fast track to a career in cybersecurity.

2. Google Data Analytics Professional Certificate – Up your data analytics game

3. Google IT Support Professional Certificate – Support your organization in IT

Python’s built-in tracemalloc module comes with functions that’ll help you understand memory usage and debug applications. With tracemalloc, you can get where and how many blocks of memory have been allocated, take snapshots, compare differences between snapshots, and more.

We’ll look at some of these in this tutorial. Let’s get started.

 

Before You Begin

 

We’ll use a simple Python script for data processing. For this, we’ll create a sample dataset and process it. Besides a recent version of Python, you also need pandas and NumPy in your working environment.

Create a virtual environment and activate it:

$ python3 -m venv v1
$ source v1/bin/activate

 

And install the required libraries:

$ pip3 install numpy pandas

 

You can find the code for this tutorial on GitHub.

 

Create a Sample Dataset with Order Details

 

We’ll generate a sample CSV file with order details. You can run the following script to create a CSV file with 100K order records:

# create_data.py
import pandas as pd
import numpy as np

# Create a sample dataset with order details
num_orders = 100000
data = {
	'OrderID': np.arange(1, num_orders + 1),
	'CustomerID': np.random.randint(1000, 5000, num_orders),
	'OrderAmount': np.random.uniform(10.0, 1000.0, num_orders).round(2),
	'OrderDate': pd.date_range(start="2023-01-01", periods=num_orders, freq='min')
}

df = pd.DataFrame(data)
df.to_csv('order_data.csv', index=False)

 

This script populates a pandas dataframe with 100K records with the following four features, and exports the dataframe to a CSV file:

  • OrderID: Unique identifier for each order
  • CustomerID: ID for the customer
  • OrderAmount: The amount of each order
  • OrderDate: The date and time of the order

Trace Memory Allocation with tracemalloc

 

Now we’ll create a Python script to load and process the dataset. We’ll also trace memory allocations.

First, we define functions load_data and process_data to load and process records from the CSV file:

# main.py
import pandas as pd

def load_data(file_path):
    print("Loading data...")
    df = pd.read_csv(file_path)
    return df

def process_data(df):
    print("Processing data...")
    df['DiscountedAmount'] = df['OrderAmount'] * 0.9  # Apply a 10% discount
    df['OrderYear'] = pd.to_datetime(df['OrderDate']).dt.year  # Extract the order year
    return df

 

We can then go ahead with tracing memory allocation by doing the following:

  • Initialize the memory tracing with tracemalloc.start().
  • The load_data() function reads the CSV file into a dataframe. We take a snapshot of memory usage after this step.
  • The process_data() function adds two new columns to the dataframe: ‘DiscountedAmount’ and ‘OrderYear’. We take another snapshot after processing.
  • We compare the two snapshots to find memory usage differences and print out the top memory-consuming lines.
  • And then print the current and peak memory usage to understand the overall impact.

Here’s the corresponding code:

import tracemalloc

def main():
    # Start tracing memory allocations
    tracemalloc.start()

    # Load data
    df = load_data('order_data.csv')

    # Take a snapshot
    snapshot1 = tracemalloc.take_snapshot()

    # Process data
    df = process_data(df)

    # Take another snapshot
    snapshot2 = tracemalloc.take_snapshot()

    # Compare snapshots
    top_stats = snapshot2.compare_to(snapshot1, 'lineno')

    print("[ Top memory-consuming lines ]")
    for stat in top_stats[:10]:
        print(stat)

    # Current and peak memory usage
    current, peak = tracemalloc.get_traced_memory()
    print(f"Current memory usage: {current / 1024 / 1024:.1f} MB")
    print(f"Peak usage: {peak / 1024 / 1024:.1f} MB")

    tracemalloc.stop()

if __name__ == "__main__":
    main()

 

Now run the Python script:

 

This outputs the top memory-consuming lines as well as the current and peak memory usage:

Loading data...
Processing data...
[ Top 3 memory-consuming lines ]
/home/balapriya/trace_malloc/v1/lib/python3.11/site-packages/pandas/core/frame.py:12683: size=1172 KiB (+1172 KiB), count=4 (+4), average=293 KiB
/home/balapriya/trace_malloc/v1/lib/python3.11/site-packages/pandas/core/arrays/datetimelike.py:2354: size=781 KiB (+781 KiB), count=3 (+3), average=260 KiB
:123: size=34.6 KiB (+15.3 KiB), count=399 (+180), average=89 B
Current memory usage: 10.8 MB
Peak usage: 13.6 MB

 

Wrapping Up

 

Using tracemalloc to trace memory allocation helps identify memory-intensive operations and potentially optimize performance using the memory trace and statistics returned.

You should be able to see if you can use more efficient data structures and processing methods to minimize memory usage. For long-running applications, you can use tracemalloc periodically to track memory usage. That said, you can always use tracemalloc in conjunction with other profiling tools to get a comprehensive view of memory usage.

If you’re interested in learning memory profiling with memory-profiler, read Introduction to Memory Profiling in Python.
 
 

Bala Priya C is a developer and technical writer from India. She likes working at the intersection of math, programming, data science, and content creation. Her areas of interest and expertise include DevOps, data science, and natural language processing. She enjoys reading, writing, coding, and coffee! Currently, she’s working on learning and sharing her knowledge with the developer community by authoring tutorials, how-to guides, opinion pieces, and more. Bala also creates engaging resource overviews and coding tutorials.



Recent Articles

Related Stories

Leave A Reply

Please enter your comment!
Please enter your name here