Image by Author | Canva
Python’s built-in statistics
module offers a handy set of tools for computing fundamental statistical measures— no external installations required. It covers measures of central tendency (like mean, median, and mode), dispersion (like standard deviation and variance), and even provides specialized functionality such as calculating covariance and simple linear regression.
Since these functions work seamlessly on lists or other iterables of real-valued numbers, the statistics
module is an excellent choice for smaller datasets or quick, straightforward data analysis tasks. It’s included with every Python installation, ensuring statistical functions are always ready to go when you need them.
Let’s have a look at the different functions included within the statistics
module, and point to more in-depth tutorials on each of them individually. In the overviews below you will find links to individual tutorials from our sister site Statology, each describing how to use a specific function from the module.
Function | Short Description | Syntax Example | Returns |
---|---|---|---|
mean() | Calculates the arithmetic mean | statistics.mean([1, 2, 3]) | Float (e.g., 2.0) |
fmean() | Faster mean using float conversion | statistics.fmean([1, 2, 3]) | Float (e.g., 2.0) |
geometric_mean() | Calculates the geometric mean | statistics.geometric_mean([1, 2, 3]) | Float |
harmonic_mean() | Calculates the harmonic mean | statistics.harmonic_mean([1, 2, 3]) | Float |
median() | Finds the middle value in a dataset | statistics.median([1, 2, 3]) | Float or Int |
median_low() | Returns the lower middle value | statistics.median_low([1, 2, 3, 4]) | Int |
median_high() | Returns the higher middle value | statistics.median_high([1, 2, 3, 4]) | Int |
median_grouped() | Finds the median for grouped data | statistics.median_grouped([1, 2, 2, 3, 4]) | Float |
mode() | Returns the most common value | statistics.mode([1, 2, 2, 3]) | Int or Float |
multimode() | Returns all modes in a list | statistics.multimode([1, 2, 2, 3, 3]) | List |
quantiles() | Divides data into equal parts | statistics.quantiles([1, 2, 3, 4]) | List of Floats |
pstdev() | Population standard deviation | statistics.pstdev([1, 2, 3]) | Float |
pvariance() | Population variance | statistics.pvariance([1, 2, 3]) | Float |
stdev() | Sample standard deviation | statistics.stdev([1, 2, 3]) | Float |
variance() | Sample variance | statistics.variance([1, 2, 3]) | Float |
covariance() | Sample covariance of two datasets | statistics.covariance([1, 2], [3, 4]) | Float |
linear_regression() | Linear regression coefficients | statistics.linear_regression([1, 2], [3, 4]) | Tuple (slope, intercept) |
Measures of Central Tendency
1. How to Use the Python statistics.mean()
Function
What it does: Calculates the arithmetic mean by adding up all numeric values and dividing by the total count
Key points:
- Accepts a sequence or an iterator of numbers
- Accepts both integers and floats
- Raises
StatisticsError
if the data is empty or contains invalid (non-numeric) elements
2. How to Use the Python statistics.fmean()
Function
What it does: Computes the arithmetic mean like mean()
, but first converts all inputs to floats
Key points:
- Typically faster than
mean()
, though it can use more memory - Great for performance-critical use cases with floating-point data
3. How to Use the Python statistics.geometric_mean()
Function
What it does: Returns the geometric mean, which is the nth root of the product of the values
Key points:
- Useful for analyzing growth rates or ratios
- Supports only positive inputs, raising
StatisticsError
for zero or negative numbers
4. How to Use the Python statistics.harmonic_mean()
Function
What it does: Calculates the harmonic mean, best suited for averaging rates or speeds
Key points:
- Based on the reciprocal of the arithmetic mean of reciprocals
- Only defined for positive values; zero or negative data triggers
StatisticsError
5. How to Use the Python statistics.median()
Function
What it does: Identifies the middle value in a sorted dataset
Key points:
- If there’s an even number of values, it averages the two middle numbers
- Works on sequences of odd or even length
6. How to Use the Python statistics.median_low()
Function
What it does: Returns the lower median in a dataset
Key points:
- For an even number of values, it picks the lower of the two middle elements (instead of taking the average)
- Useful if a strict integer “middle” is needed for certain analyses
7. How to Use the Python statistics.median_high()
Function
What it does: Returns the higher median in a dataset
Key points:
- Similar to
median_low()
, but selects the higher of the two middle values for even-length data - Handy if you want to consistently round “up” when dealing with medians.
8. How to Use the Python statistics.median_grouped()
Function
What it does: Computes the median for grouped (binned) data under the assumption of uniform distribution across each interval
Key points:
- Defaults to an interval size of 1
- Ideal for datasets grouped into ranges instead of individual points
9. How to Use the Python statistics.mode()
Function
What it does: Finds the most frequently occurring value in a dataset
Key points:
- If multiple values tie, it returns only the first
- Raises
StatisticsError
if the dataset is empty
10. How to Use the Python statistics.multimode()
Function
What it does: Identifies all values that share the highest frequency within the data
Key points:
- Returns a list containing each “most common” value
- If all values occur equally, it simply returns the entire dataset
11. How to Use the Python statistics.quantiles()
Function
What it does: Splits sorted data into segments (default is quartiles) and returns the cut points
Key points:
- You can define how many segments to create by using the
n
parameter
Measures of Spread
12. How to Use the Python statistics.pstdev()
Function
What it does: Calculates the population standard deviation, using the full dataset (with n
as the denominator)
Key points:
- Use this when you have an entire population rather than a sample
- It differs from
stdev()
, which is geared toward sample data
13. How to Use the Python statistics.pvariance()
Function
What it does: Computes the population variance, which is the average of the squared differences from the mean, using n
as the denominator
Key points:
- Indicates how spread out an entire population’s data is
14. How to Use the Python statistics.stdev()
Function
What it does: Produces the sample standard deviation, using n-1
in the denominator
Key points:
- Suitable for analyzing a sample instead of a full population
15. How to Use the Python statistics.variance()
Function
What it does: Calculates the sample variance, measuring how spread out the values are from their mean (using n-1
)
Key points:
- Complements
stdev()
for sample-based analyses
Relations Between Two Inputs
16. How to Use the Python statistics.covariance()
Function
What it does: Determines the sample covariance of two equally sized datasets, revealing how two variables shift together
Key points:
- Uses
n-1
for sample-based calculations - A stepping stone to more advanced correlation analysis
17. How to Use the Python statistics.linear_regression()
Function
What it does: Performs a simple linear regression on two equally sized data inputs, returning slope and intercept for the best-fit line
Key points:
- Employs the least squares method
- Handy for basic trend analysis and forecasting tasks
Conclusion
The built-in statistics
module in Python covers a surprising range of functionality, from basic measures of central tendency and variability to more advanced calculations like covariance and regression. The tutorials above break down each function step by step, making it easier to integrate these capabilities into your own data workflows. Whether you’re working with a small dataset or just want quick, native statistical operations, statistics
is a dependable first choice.
Use these guides to dive deeper into each function and unlock powerful yet accessible statistical methods in your Python code.
Matthew Mayo (@mattmayo13) holds a master’s degree in computer science and a graduate diploma in data mining. As managing editor of KDnuggets & Statology, and contributing editor at Machine Learning Mastery, Matthew aims to make complex data science concepts accessible. His professional interests include natural language processing, language models, machine learning algorithms, and exploring emerging AI. He is driven by a mission to democratize knowledge in the data science community. Matthew has been coding since he was 6 years old.