Mastering Python’s Built-in Statistics Module: A Complete Guide to Essential Functions

Practical Descriptive Statistics in Python with the statistics Module

Image by Author | Canva

Python’s built-in statistics module offers a handy set of tools for computing fundamental statistical measures— no external installations required. It covers measures of central tendency (like mean, median, and mode), dispersion (like standard deviation and variance), and even provides specialized functionality such as calculating covariance and simple linear regression.

Since these functions work seamlessly on lists or other iterables of real-valued numbers, the statistics module is an excellent choice for smaller datasets or quick, straightforward data analysis tasks. It’s included with every Python installation, ensuring statistical functions are always ready to go when you need them.

Let’s have a look at the different functions included within the statistics module, and point to more in-depth tutorials on each of them individually. In the overviews below you will find links to individual tutorials from our sister site Statology, each describing how to use a specific function from the module.

Function	Short Description	Syntax Example	Returns
mean()	Calculates the arithmetic mean	statistics.mean([1, 2, 3])	Float (e.g., 2.0)
fmean()	Faster mean using float conversion	statistics.fmean([1, 2, 3])	Float (e.g., 2.0)
geometric_mean()	Calculates the geometric mean	statistics.geometric_mean([1, 2, 3])	Float
harmonic_mean()	Calculates the harmonic mean	statistics.harmonic_mean([1, 2, 3])	Float
median()	Finds the middle value in a dataset	statistics.median([1, 2, 3])	Float or Int
median_low()	Returns the lower middle value	statistics.median_low([1, 2, 3, 4])	Int
median_high()	Returns the higher middle value	statistics.median_high([1, 2, 3, 4])	Int
median_grouped()	Finds the median for grouped data	statistics.median_grouped([1, 2, 2, 3, 4])	Float
mode()	Returns the most common value	statistics.mode([1, 2, 2, 3])	Int or Float
multimode()	Returns all modes in a list	statistics.multimode([1, 2, 2, 3, 3])	List
quantiles()	Divides data into equal parts	statistics.quantiles([1, 2, 3, 4])	List of Floats
pstdev()	Population standard deviation	statistics.pstdev([1, 2, 3])	Float
pvariance()	Population variance	statistics.pvariance([1, 2, 3])	Float
stdev()	Sample standard deviation	statistics.stdev([1, 2, 3])	Float
variance()	Sample variance	statistics.variance([1, 2, 3])	Float
covariance()	Sample covariance of two datasets	statistics.covariance([1, 2], [3, 4])	Float
linear_regression()	Linear regression coefficients	statistics.linear_regression([1, 2], [3, 4])	Tuple (slope, intercept)

Measures of Central Tendency

1. How to Use the Python statistics.mean() Function

What it does: Calculates the arithmetic mean by adding up all numeric values and dividing by the total count

Key points:

Accepts a sequence or an iterator of numbers
Accepts both integers and floats
Raises StatisticsError if the data is empty or contains invalid (non-numeric) elements

2. How to Use the Python statistics.fmean() Function

What it does: Computes the arithmetic mean like mean(), but first converts all inputs to floats

Key points:

Typically faster than mean(), though it can use more memory
Great for performance-critical use cases with floating-point data

3. How to Use the Python statistics.geometric_mean() Function

What it does: Returns the geometric mean, which is the nth root of the product of the values

Key points:

Useful for analyzing growth rates or ratios
Supports only positive inputs, raising StatisticsError for zero or negative numbers

4. How to Use the Python statistics.harmonic_mean() Function

What it does: Calculates the harmonic mean, best suited for averaging rates or speeds

Key points:

Based on the reciprocal of the arithmetic mean of reciprocals
Only defined for positive values; zero or negative data triggers StatisticsError

5. How to Use the Python statistics.median() Function

What it does: Identifies the middle value in a sorted dataset

Key points:

If there’s an even number of values, it averages the two middle numbers
Works on sequences of odd or even length

6. How to Use the Python statistics.median_low() Function

What it does: Returns the lower median in a dataset

Key points:

For an even number of values, it picks the lower of the two middle elements (instead of taking the average)
Useful if a strict integer “middle” is needed for certain analyses

7. How to Use the Python statistics.median_high() Function

What it does: Returns the higher median in a dataset

Key points:

Similar to median_low(), but selects the higher of the two middle values for even-length data
Handy if you want to consistently round “up” when dealing with medians.

8. How to Use the Python statistics.median_grouped() Function

What it does: Computes the median for grouped (binned) data under the assumption of uniform distribution across each interval

Key points:

Defaults to an interval size of 1
Ideal for datasets grouped into ranges instead of individual points

9. How to Use the Python statistics.mode() Function

What it does: Finds the most frequently occurring value in a dataset

Key points:

If multiple values tie, it returns only the first
Raises StatisticsError if the dataset is empty

10. How to Use the Python statistics.multimode() Function

What it does: Identifies all values that share the highest frequency within the data

Key points:

Returns a list containing each “most common” value
If all values occur equally, it simply returns the entire dataset

11. How to Use the Python statistics.quantiles() Function

What it does: Splits sorted data into segments (default is quartiles) and returns the cut points

Key points:

You can define how many segments to create by using the n parameter

Measures of Spread

12. How to Use the Python statistics.pstdev() Function

What it does: Calculates the population standard deviation, using the full dataset (with n as the denominator)

Key points:

Use this when you have an entire population rather than a sample
It differs from stdev(), which is geared toward sample data

13. How to Use the Python statistics.pvariance() Function

What it does: Computes the population variance, which is the average of the squared differences from the mean, using n as the denominator

Key points:

Indicates how spread out an entire population’s data is

14. How to Use the Python statistics.stdev() Function

What it does: Produces the sample standard deviation, using n-1 in the denominator

Key points:

Suitable for analyzing a sample instead of a full population

15. How to Use the Python statistics.variance() Function

What it does: Calculates the sample variance, measuring how spread out the values are from their mean (using n-1)

Key points:

Complements stdev() for sample-based analyses

Relations Between Two Inputs

16. How to Use the Python statistics.covariance() Function

What it does: Determines the sample covariance of two equally sized datasets, revealing how two variables shift together

Key points:

Uses n-1 for sample-based calculations
A stepping stone to more advanced correlation analysis

17. How to Use the Python statistics.linear_regression() Function

What it does: Performs a simple linear regression on two equally sized data inputs, returning slope and intercept for the best-fit line

Key points:

Employs the least squares method
Handy for basic trend analysis and forecasting tasks

Conclusion

The built-in statistics module in Python covers a surprising range of functionality, from basic measures of central tendency and variability to more advanced calculations like covariance and regression. The tutorials above break down each function step by step, making it easier to integrate these capabilities into your own data workflows. Whether you’re working with a small dataset or just want quick, native statistical operations, statistics is a dependable first choice.

Use these guides to dive deeper into each function and unlock powerful yet accessible statistical methods in your Python code.

Matthew Mayo (@mattmayo13) holds a master’s degree in computer science and a graduate diploma in data mining. As managing editor of KDnuggets & Statology, and contributing editor at Machine Learning Mastery, Matthew aims to make complex data science concepts accessible. His professional interests include natural language processing, language models, machine learning algorithms, and exploring emerging AI. He is driven by a mission to democratize knowledge in the data science community. Matthew has been coding since he was 6 years old.

Mastering Python’s Built-in Statistics Module: A Complete Guide to Essential Functions

Measures of Central Tendency

Measures of Spread

Relations Between Two Inputs

Conclusion

Recent Articles

Discover the best way to learn modern Android development

James Gunn Addresses That Controversial Superman Flying Shot

ByteDance Introduces UI-TARS: A Native GUI Agent Model that Integrates Perception, Action, Reasoning, and Memory into a Scalable and Adaptive Framework

The Role of AI in Shaping the Future of Work

Project Stargate, the AI emergency, and batsh*t AI cryonics • Graham Cluley

Related Stories

Leave A Reply Cancel reply