Exploring Natural Sorting in Python


 

natsort
Image by Author
 

What Is Natural Sorting, And Why Do We Need It?

 

When working with Python iterables such as lists, sorting is a common operation you’ll perform. To sort lists you can use the list method sort() to sort a list in place or the sorted() function that returns a sorted list.

The sorted() function works fine when you have a list of numbers or strings containing letters. But what about strings containing alphanumeric characters, such as filenames, directory names, version numbers, and more? The sorted() function performs lexicographic sorting.

Look at this simple example:

# List of filenames
filenames = ["file10.txt", "file2.txt", "file1.txt"]

sorted_filenames = sorted(filenames)
print(sorted_filenames)

 

You’ll get the following output:

Output >>> ['file1.txt', 'file10.txt', 'file2.txt']

 

Well, ‘file10.txt’ comes before ‘file2.txt’ in the output. Not the intuitive sorting order we’re hoping for. This is because the sorted() function uses the ASCII values of the characters to sort and not the numeric values. Enter natural sorting.

Natural sorting is a sorting technique that arranges elements in a way that reflects their natural order, particularly for alphanumeric data. Unlike lexicographic sorting, natural sorting interprets the numerical value of digits within strings and arranges them accordingly, resulting in a more meaningful and expected sequence.

In this tutorial, we’ll explore natural sorting with the Python library natsort.

 

Getting Started

 

To get started, you can install the natsort library using pip:

 

As a best practice, install the required package in a virtual environment for the project. Because natsort requires Python 3.7 or later, make sure you’re using a recent Python version, preferably Python 3.11 or later. To learn how to manage different Python versions, read Too Many Python Versions to Manage? Pyenv to the Rescue.

 

Natural Sorting Basic Examples

 
We’ll start with simple use cases where natural sorting is beneficial:

  • Sorting file names: When working with file names containing digits, natural sorting ensures that files are ordered in the natural intuitive order.
  • Version sorting: Natural sorting is also helpful for ordering strings of version numbers, ensuring that versions are sorted based on their numerical values rather than their ASCII values. Which might not reflect the desired versioning sequence.

Now let’s proceed to code these examples.

 

Sorting Filenames

 
Now that we’ve installed the natsort library, we can import it into our Python script and use the different functions that the library offers.

Let’s revisit the first example of sorting file names (the one we saw at the beginning of the tutorial) where the lexicographic sorting with the function was not what we wanted.

Now let’s sort the same list using the natsorted() function like so:

import natsort

# List of filenames
filenames = ["file10.txt", "file2.txt", "file1.txt"]

# Sort filenames naturally
sorted_filenames = natsort.natsorted(filenames)
print(sorted_filenames)

 

In this example, natsorted() function from the natsort library is used to sort the list of file names naturally. As a result, the file names are arranged in the expected numerical order:

Output >>> ['file1.txt', 'file2.txt', 'file10.txt']

 

Sorting Version Numbers

 
Let’s take another similar example where we have strings denoting versions:

import natsort

# List of version numbers
versions = ["v-1.10", "v-1.2", "v-1.5"]

# Sort versions naturally
sorted_versions = natsort.natsorted(versions)

print(sorted_versions)

 

Here, the natsorted() function is applied to sort the list of version numbers naturally. The resulting sorted list maintains the correct numerical order of the versions:

Output >>> ['v-1.2', 'v-1.5', 'v-1.10']

 

Customizing Sorting with a Key

 

When using the built-in sorted() function, you might have used the key parameter to customize. Similarly, the sorted() function also takes the optional key parameter which you can use to sort based on specific criteria.

Let’s take an example: we have file_data which is the list of tuples. The first element in the tuple (at index 0) is the file name and the second item (at index 1) is the size of the file.

Say we want to sort based on the file size in ascending order. So we set the key parameter to lambda x: x[1] so that the file size at index 1 is used as the sorting key:

import natsort

# List of tuples containing filename and size
file_data = [
("data_20230101_080000.csv", 100),
("data_20221231_235959.csv", 150),
("data_20230201_120000.csv", 120),
("data_20230115_093000.csv", 80)
]

# Sort file data based on file size
sorted_file_data = natsort.natsorted(file_data, key=lambda x:x[1])

# Print sorted file data
for filename, size in sorted_file_data:
    print(filename, size)

 

Here’s the output:

data_20230115_093000.csv 80
data_20230101_080000.csv 100
data_20230201_120000.csv 120
data_20221231_235959.csv 150

 

Case-Insensitive Sorting of Strings

 

Another use case where natural sorting is helpful is when you need case-insensitive sorting of strings. Again the lexicographic sorting based on ASCII values will not give the desired results.

To perform case-insensitive sorting, we can set alg to natsort.ns.IGNORECASE which will ignore the case when sorting. The alg key controls the algorithm that natsorted() uses:

import natsort

# List of strings with mixed case
words = ["apple", "Banana", "cat", "Dog", "Elephant"]

# Sort words naturally with case-insensitivity
sorted_words = natsort.natsorted(words, alg=natsort.ns.IGNORECASE)

print(sorted_words)

 

Here, the list of words with mixed case is sorted naturally with case-insensitivity:

Output >>> ['apple', 'Banana', 'cat', 'Dog', 'Elephant']

 

Wrapping Up

 

And that’s a wrap! In this tutorial, we reviewed the limitations of lexicographic sorting and how natural sorting can be a good alternative when working with alphanumeric strings. You can find all the code on GitHub.

We started with simple examples and also looked at sorting based on custom keys and handling case-insensitive sorting in Python. Next, you may explore other capabilities of the natsort library. I’ll see you all soon in another Python tutorial. Until then, keep coding!

 

 

Bala Priya C is a developer and technical writer from India. She likes working at the intersection of math, programming, data science, and content creation. Her areas of interest and expertise include DevOps, data science, and natural language processing. She enjoys reading, writing, coding, and coffee! Currently, she’s working on learning and sharing her knowledge with the developer community by authoring tutorials, how-to guides, opinion pieces, and more. Bala also creates engaging resource overviews and coding tutorials.



Recent Articles

Related Stories

Leave A Reply

Please enter your comment!
Please enter your name here