Mastering Python’s Built-in Statistics Module: A Complete Guide to Essential Functions


Practical Descriptive Statistics in Python with the statistics Module
Image by Author | Canva

 

Python’s built-in statistics module offers a handy set of tools for computing fundamental statistical measures— no external installations required. It covers measures of central tendency (like mean, median, and mode), dispersion (like standard deviation and variance), and even provides specialized functionality such as calculating covariance and simple linear regression.

Since these functions work seamlessly on lists or other iterables of real-valued numbers, the statistics module is an excellent choice for smaller datasets or quick, straightforward data analysis tasks. It’s included with every Python installation, ensuring statistical functions are always ready to go when you need them.

Let’s have a look at the different functions included within the statistics module, and point to more in-depth tutorials on each of them individually. In the overviews below you will find links to individual tutorials from our sister site Statology, each describing how to use a specific function from the module.

 

Function Short Description Syntax Example Returns
mean() Calculates the arithmetic mean statistics.mean([1, 2, 3]) Float (e.g., 2.0)
fmean() Faster mean using float conversion statistics.fmean([1, 2, 3]) Float (e.g., 2.0)
geometric_mean() Calculates the geometric mean statistics.geometric_mean([1, 2, 3]) Float
harmonic_mean() Calculates the harmonic mean statistics.harmonic_mean([1, 2, 3]) Float
median() Finds the middle value in a dataset statistics.median([1, 2, 3]) Float or Int
median_low() Returns the lower middle value statistics.median_low([1, 2, 3, 4]) Int
median_high() Returns the higher middle value statistics.median_high([1, 2, 3, 4]) Int
median_grouped() Finds the median for grouped data statistics.median_grouped([1, 2, 2, 3, 4]) Float
mode() Returns the most common value statistics.mode([1, 2, 2, 3]) Int or Float
multimode() Returns all modes in a list statistics.multimode([1, 2, 2, 3, 3]) List
quantiles() Divides data into equal parts statistics.quantiles([1, 2, 3, 4]) List of Floats
pstdev() Population standard deviation statistics.pstdev([1, 2, 3]) Float
pvariance() Population variance statistics.pvariance([1, 2, 3]) Float
stdev() Sample standard deviation statistics.stdev([1, 2, 3]) Float
variance() Sample variance statistics.variance([1, 2, 3]) Float
covariance() Sample covariance of two datasets statistics.covariance([1, 2], [3, 4]) Float
linear_regression() Linear regression coefficients statistics.linear_regression([1, 2], [3, 4]) Tuple (slope, intercept)

 

Measures of Central Tendency

 
1. How to Use the Python statistics.mean() Function

What it does: Calculates the arithmetic mean by adding up all numeric values and dividing by the total count

Key points:

  • Accepts a sequence or an iterator of numbers
  • Accepts both integers and floats
  • Raises StatisticsError if the data is empty or contains invalid (non-numeric) elements

 
2. How to Use the Python statistics.fmean() Function

What it does: Computes the arithmetic mean like mean(), but first converts all inputs to floats

Key points:

  • Typically faster than mean(), though it can use more memory
  • Great for performance-critical use cases with floating-point data

 
3. How to Use the Python statistics.geometric_mean() Function

What it does: Returns the geometric mean, which is the nth root of the product of the values

Key points:

  • Useful for analyzing growth rates or ratios
  • Supports only positive inputs, raising StatisticsError for zero or negative numbers

 
4. How to Use the Python statistics.harmonic_mean() Function

What it does: Calculates the harmonic mean, best suited for averaging rates or speeds

Key points:

  • Based on the reciprocal of the arithmetic mean of reciprocals
  • Only defined for positive values; zero or negative data triggers StatisticsError

 
5. How to Use the Python statistics.median() Function

What it does: Identifies the middle value in a sorted dataset

Key points:

  • If there’s an even number of values, it averages the two middle numbers
  • Works on sequences of odd or even length

 
6. How to Use the Python statistics.median_low() Function

What it does: Returns the lower median in a dataset

Key points:

  • For an even number of values, it picks the lower of the two middle elements (instead of taking the average)
  • Useful if a strict integer “middle” is needed for certain analyses

 
7. How to Use the Python statistics.median_high() Function

What it does: Returns the higher median in a dataset

Key points:

  • Similar to median_low(), but selects the higher of the two middle values for even-length data
  • Handy if you want to consistently round “up” when dealing with medians.

 
8. How to Use the Python statistics.median_grouped() Function

What it does: Computes the median for grouped (binned) data under the assumption of uniform distribution across each interval

Key points:

  • Defaults to an interval size of 1
  • Ideal for datasets grouped into ranges instead of individual points

 
9. How to Use the Python statistics.mode() Function

What it does: Finds the most frequently occurring value in a dataset

Key points:

  • If multiple values tie, it returns only the first
  • Raises StatisticsError if the dataset is empty

 
10. How to Use the Python statistics.multimode() Function

What it does: Identifies all values that share the highest frequency within the data

Key points:

  • Returns a list containing each “most common” value
  • If all values occur equally, it simply returns the entire dataset

 
11. How to Use the Python statistics.quantiles() Function

What it does: Splits sorted data into segments (default is quartiles) and returns the cut points

Key points:

  • You can define how many segments to create by using the n parameter

 

Measures of Spread

 
12. How to Use the Python statistics.pstdev() Function

What it does: Calculates the population standard deviation, using the full dataset (with n as the denominator)

Key points:

  • Use this when you have an entire population rather than a sample
  • It differs from stdev(), which is geared toward sample data

 
13. How to Use the Python statistics.pvariance() Function

What it does: Computes the population variance, which is the average of the squared differences from the mean, using n as the denominator

Key points:

  • Indicates how spread out an entire population’s data is

 
14. How to Use the Python statistics.stdev() Function

What it does: Produces the sample standard deviation, using n-1 in the denominator

Key points:

  • Suitable for analyzing a sample instead of a full population

 
15. How to Use the Python statistics.variance() Function

What it does: Calculates the sample variance, measuring how spread out the values are from their mean (using n-1)

Key points:

  • Complements stdev() for sample-based analyses

 

Relations Between Two Inputs

 
16. How to Use the Python statistics.covariance() Function

What it does: Determines the sample covariance of two equally sized datasets, revealing how two variables shift together

Key points:

  • Uses n-1 for sample-based calculations
  • A stepping stone to more advanced correlation analysis

 
17. How to Use the Python statistics.linear_regression() Function

What it does: Performs a simple linear regression on two equally sized data inputs, returning slope and intercept for the best-fit line

Key points:

  • Employs the least squares method
  • Handy for basic trend analysis and forecasting tasks

 

Conclusion

 
The built-in statistics module in Python covers a surprising range of functionality, from basic measures of central tendency and variability to more advanced calculations like covariance and regression. The tutorials above break down each function step by step, making it easier to integrate these capabilities into your own data workflows. Whether you’re working with a small dataset or just want quick, native statistical operations, statistics is a dependable first choice.

Use these guides to dive deeper into each function and unlock powerful yet accessible statistical methods in your Python code.
 
 

Matthew Mayo (@mattmayo13) holds a master’s degree in computer science and a graduate diploma in data mining. As managing editor of KDnuggets & Statology, and contributing editor at Machine Learning Mastery, Matthew aims to make complex data science concepts accessible. His professional interests include natural language processing, language models, machine learning algorithms, and exploring emerging AI. He is driven by a mission to democratize knowledge in the data science community. Matthew has been coding since he was 6 years old.



Recent Articles

Related Stories

Leave A Reply

Please enter your comment!
Please enter your name here