Descriptive Statistics
Mean, median, mode, standard deviation, variance, and more from a dataset.
PhD Statistics, Fellow of the Royal Statistical Society
Statistician and data scientist with 15 years in applied statistics, probability theory and data visualisation across industry and academia.
Related calculators
About the Descriptive Statistics
Descriptive statistics are summary measures that capture the central tendency, spread, and shape of a dataset, reducing potentially thousands of data points to a handful of numbers that tell the essential story. They form the foundation of data analysis in every quantitative discipline โ from clinical trials measuring whether a drug reduces blood pressure, to A/B tests comparing conversion rates, to quality control monitoring manufacturing tolerances. Unlike inferential statistics (which draw conclusions about populations from samples), descriptive statistics simply describe what is in the data โ but doing so precisely and completely is often more informative than people expect.
Three measures of central tendency each capture something different: the mean is the arithmetic average and is sensitive to outliers (a billionaire in a village raises the mean income dramatically); the median is the middle value when sorted and is robust to outliers (it barely changes when the billionaire arrives); the mode is the most frequent value and is most useful for categorical or discrete data. The choice of which to report matters significantly. Income and house price statistics are typically reported as median rather than mean precisely because the distribution is right-skewed โ the mean overstates what a typical person earns.
Spread is equally important. Two classes could have the same mean exam score of 65%, but one might range from 40โ90% (high variance โ students are at very different levels) and another from 60โ70% (low variance โ students are clustered around the same ability). Standard deviation quantifies this spread as the average distance of data points from the mean; interquartile range (IQR) gives the range covering the middle 50% of data and is outlier-resistant. Understanding spread is essential for interpreting means โ a mean of 65% with an SD of 15 points tells a very different story than a mean of 65% with an SD of 3 points.
How it works
Mean (xฬ) = ฮฃx / n Median = middle value when sorted (average of two middles if n is even) Variance (sยฒ) = ฮฃ(x โ xฬ)ยฒ / (n โ 1) [sample] Standard Deviation (s) = โvariance IQR = Q3 โ Q1
Where
xฬSample mean โ arithmetic average of all valuesnNumber of data pointsฮฃSum across all valuessยฒSample variance โ average squared deviation from the mean (divided by nโ1 for unbiased estimate)Q1 / Q325th and 75th percentiles respectively; IQR = Q3 โ Q1 covers the middle 50% of dataWorked example
Dataset: exam scores for 9 students: 45, 52, 61, 63, 63, 70, 74, 82, 94.
n = 9, ฮฃx = 604, Mean = 604/9 = 67.1.
Sorted: 45, 52, 61, 63, 63, 70, 74, 82, 94. Median = 5th value = 63.
Mode = 63 (appears twice).
Variance: deviations from mean (67.1): โ22.1ยฒ, โ15.1ยฒ, โ6.1ยฒ, โ4.1ยฒ, โ4.1ยฒ, 2.9ยฒ, 6.9ยฒ, 14.9ยฒ, 26.9ยฒ = 488.41 + 228.01 + 37.21 + 16.81 + 16.81 + 8.41 + 47.61 + 222.01 + 723.61 = 1788.9.
Sample variance = 1788.9 / (9โ1) = 223.6. Standard deviation = โ223.6 = 14.95.
Q1 โ 58 (between 52 and 61), Q3 โ 74, IQR = 74 โ 58 = 16.
Interpretation: The class averages 67.1 (mean) or 63 (median), with two thirds of students within ~15 points of the mean.
Tips to improve your result
- 1.
When your data has outliers or is skewed (income, house prices, time-to-complete), use median and IQR rather than mean and standard deviation. Median is always more representative of the "typical" value in asymmetric distributions.
- 2.
The "nโ1" in sample standard deviation (Bessel's correction) is not an error โ it corrects for the fact that sample variance systematically underestimates population variance. Use nโ1 for sample data; n for population data. Most statistical software and this calculator use nโ1 by default.
- 3.
Outliers can be identified using the IQR rule: any value below Q1 โ 1.5รIQR or above Q3 + 1.5รIQR is considered a mild outlier; below Q1 โ 3รIQR or above Q3 + 3รIQR is an extreme outlier. This is the standard method used by box-and-whisker plots.
- 4.
Standard deviation has the same units as the original data. Variance (SDยฒ) has squared units, which is why SD is more interpretable in context. If exam scores have SD = 15 points, that's directly meaningful; "variance = 225 pointsยฒ" is not.
- 5.
The coefficient of variation (CV = SD / Mean ร 100) allows comparing variability across datasets with different scales or units. A height dataset with mean 170cm and SD 8cm has CV = 4.7%; a weight dataset with mean 75kg and SD 12kg has CV = 16% โ weight is relatively more variable despite the absolute SD being larger.