MarshMallow: The Sweetest Python Library for Data Serialization and Validation



Image by Author | Leonardo AI & Canva

 

Data serialization is a basic programming concept with great value in everyday programs. It refers to converting complex data objects to an intermediate format that can be saved and easily converted back to its original form. However, the common data serialization Python libraries like JSON and pickle are very limited in their functionality. With structured programs and object-oriented programming, we need stronger support to handle data classes.

Marshmallow is one of the most famous data-handling libraries that is widely used by Python developers to develop robust software applications. It supports data serialization and provides a strong abstract solution for handling data validation in an object-oriented paradigm.

In this article, we use a running example given below to understand how to use Marshmallow in existing projects. The code shows three classes representing a simple e-commerce model: Product, Customer, and Order. Each class minimally defines its parameters. We’ll see how to save an instance of an object and ensure its correctness when we try to load it again in our code.

from typing import List

class Product:
    def __init__(self, _id: int, name: str, price: float):
    	self._id = _id
    	self.name = name
    	self.price = price

class Customer:
    def __init__(self, _id: int, name: str):
    	self._id = _id
    	self.name = name

class Order:
    def __init__(self, _id: int, customer: Customer, products: List[Product]):
    	self._id = _id
    	self.customer = customer
    	self.products = products

 

Getting Started with Marshmallow

 

Installation

Marshmallow is available as a Python library at PyPI and can be easily installed using pip. To install or upgrade the Marshmallow dependency, run the below command:

pip install -U marshmallow

 

This installs the recent stable version of Marshmallow in the active environment. If you want the development version of the library with all the latest functionality, you can install it using the command below:

pip install -U git+https://github.com/marshmallow-code/marshmallow.git@dev

 

Creating Schemas

Let’s start by adding Marshmallow functionality to the Product class. We need to create a new class that represents a schema an instance of the Product class must follow. Think of a schema like a blueprint, that defines the variables in the Product class and the datatype they belong to.

Let’s break down and understand the basic code below:

from marshmallow import Schema, fields

class ProductSchema(Schema):
    _id = fields.Int(required=True)
    name = fields.Str(required=True)
    price = fields.Float(required=True)

 

We create a new class that inherits from the Schema class in Marshmallow. Then, we declare the same variable names as our Product class and define their field types. The fields class in Marshmallow supports various data types; here, we use the primitive types Int, String, and Float.

 

Serialization

Now that we have a schema defined for our object, we can now convert a Python class instance into a JSON string or a Python dictionary for serialization. Here’s the basic implementation:

product = Product(_id=4, name="Test Product", price=10.6)
schema = ProductSchema()
    
# For Python Dictionary object
result = schema.dump(product)

# type(dict) -> '_id': 4, 'name': 'Test Product', 'price': 10.6

# For JSON-serializable string
result = schema.dumps(product)

# type(str) -> "_id": 4, "name": "Test Product", "price": 10.6

 

We create an object of our ProductSchema, which converts a Product object to a serializable format like JSON or dictionary.

 

Note the difference between dump and dumps function results. One returns a Python dictionary object that can be saved using pickle, and the other returns a string object that follows the JSON format.

 

Deserialization

To reverse the serialization process, we use deserialization. An object is saved so it can be loaded and accessed later, and Marshmallow helps with that.

A Python dictionary can be validated using the load function, which verifies the variables and their associated datatypes. The below function shows how it works:

product_data = 
    "_id": 4,
    "name": "Test Product",
    "price": 50.4,

result = schema.load(product_data)
print(result)  	

# type(dict) -> '_id': 4, 'name': 'Test Product', 'price': 50.4

faulty_data = 
    "_id": 5,
    "name": "Test Product",
    "price": "ABCD" # Wrong input datatype

result = schema.load(faulty_data) 

# Raises validation error

 

The schema validates that the dictionary has the correct parameters and data types. If the validation fails, a ValidationError is raised so it’s essential to wrap the load function in a try-except block. If it is successful, the result object is still a dictionary when the original argument is also a dictionary. Not so helpful right? What we generally want is to validate the dictionary and convert it back to the original object it was serialized from.

To achieve this, we use the post_load decorator provided by Marshmallow:

from marshmallow import Schema, fields, post_load

class ProductSchema(Schema):
    _id = fields.Int(required=True)
    name = fields.Str(required=True)
    price = fields.Float(required=True)

@post_load
def create_product(self, data, **kwargs):
    return Product(**data)

 

We create a function in the schema class with the post_load decorator. This function takes the validated dictionary and converts it back to a Product object. Including **kwargs is important as Marshmallow may pass additional necessary arguments through the decorator.

This modification to the load functionality ensures that after validation, the Python dictionary is passed to the post_load function, which creates a Product object from the dictionary. This makes it possible to deserialize an object using Marshmallow.

 

Validation

Often, we need additional validation specific to our use case. While data type validation is essential, it doesn’t cover all the validation we might need. Even in this simple example, extra validation is needed for our Product object. We need to ensure that the price is not below 0. We can also define more rules, such as ensuring that our product name is between 3 and 128 characters. These rules help ensure our codebase conforms to a defined database schema.

Let us now see how we can implement this validation using Marshmallow:

from marshmallow import Schema, fields, validates, ValidationError, post_load

class ProductSchema(Schema):
    _id = fields.Int(required=True)
    name = fields.Str(required=True)
    price = fields.Float(required=True)

@post_load
def create_product(self, data, **kwargs):
    return Product(**data)


@validates('price')
def validate_price(self, value):
    if value  128:
        raise ValidationError('Name of Product must be between 3 and 128 letters.')

 

We modify the ProductSchema class to add two new functions. One validates the price parameter and the other validates the name parameter. We use the validates function decorator and annotate the name of the variable that the function is supposed to validate. The implementation of these functions is straightforward: if the value is incorrect, we raise a ValidationError.

 

Nested Schemas

Now, with the basic Product class validation, we have covered all the basic functionality provided by the Marshmallow library. Let us now build complexity and see how the other two classes will be validated.

The Customer class is fairly straightforward as it contains the basic attributes and primitive datatypes.

class CustomerSchema(Schema):
    _id = fields.Int(required=True)
    name = fields.Int(required=True)

 

However, defining the schema for the Order class forces us to learn a new and required concept of Nested Schemas. An order will be associated with a specific customer and the customer can order any number of products. This is defined in the class definition, and when we validate the Order schema, we also need to validate the Product and Customer objects passed to it.

Instead of redefining everything in the OrderSchema, we will avoid repetition and use nested schemas. The order schema is defined as follows:

class OrderSchema(Schema):
    _id = fields.Int(require=True)
    customer = fields.Nested(CustomerSchema, required=True)
    products = fields.List(fields.Nested(ProductSchema), required=True)

 

Within the Order schema, we include the ProductSchema and CustomerSchema definitions. This ensures that the defined validations for these schemas are automatically applied, following the DRY (Don’t Repeat Yourself) principle in programming, which allows the reuse of existing code.

 

Wrapping Up

 
In this article, we covered the quick start and use case of the Marshmallow library, one of the most popular serialization and data validation libraries in Python. Although similar to Pydantic, many developers prefer Marshmallow due to its schema definition method, which resembles validation libraries in other languages like JavaScript.

Marshmallow is easy to integrate with Python backend frameworks like FastAPI and Flask, making it a popular choice for web framework and data validation tasks, as well as for ORMs like SQLAlchemy.

 
 

Kanwal Mehreen Kanwal is a machine learning engineer and a technical writer with a profound passion for data science and the intersection of AI with medicine. She co-authored the ebook “Maximizing Productivity with ChatGPT”. As a Google Generation Scholar 2022 for APAC, she champions diversity and academic excellence. She’s also recognized as a Teradata Diversity in Tech Scholar, Mitacs Globalink Research Scholar, and Harvard WeCode Scholar. Kanwal is an ardent advocate for change, having founded FEMCodes to empower women in STEM fields.

Recent Articles

Related Stories

Leave A Reply

Please enter your comment!
Please enter your name here