Understanding R’s `describe()` Function: A Complete Guide to Summary Statistics
R-bloggers 2026-04-30
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
describe() Function: A Complete Guide to Summary Statistics
Table of Contents
Introduction to describe()
The describe() function from R’s psych package (Revelle, 2023) provides a comprehensive statistical summary of your dataset. Unlike R’s base summary() function, it includes additional metrics that are particularly useful for data exploration and assumption checking.
library(psych) describe(your_data)
Breaking Down the Output Columns
Here’s what each column in the output represents:
Column Description Formula/Calculation Ideal Use Case vars Variable index number – Tracking variable order n Complete caseslength(na.omit(x))
Data completeness check
mean
Arithmetic average
sum(x)/n
Normally distributed data
sd
Standard deviation
sqrt(var(x))
Measuring spread
median
50th percentile
quantile(x, 0.5)
Skewed distributions
trimmed
Mean after removing extremes
mean(x, trim=0.1)
Robust central tendency
mad
Median absolute deviation
median(abs(x-median(x)))
Outlier-resistant spread
min
Minimum value
min(x)
Range assessment
max
Maximum value
max(x)
Range assessment
range
Max – Min
max(x)-min(x)
Total spread
skew
Distribution asymmetry
sum((x-mean(x))³)/(n*sd(x)³)
Detecting skew direction
kurtosis
Tailedness
sum((x-mean(x))⁴)/(n*sd(x)⁴)-3
Outlier propensity
se
Standard error
sd(x)/sqrt(n)
Precision of mean estimate
Key Statistics and Their Interpretation
Central Tendency
- Mean vs. Median: Differences indicate skewness
- Trimmed Mean: Removes influence of outliers (default drops top/bottom 10%)
Variability
- SD vs. MAD: Use MAD when outliers are present
- Range: Simple but outlier-sensitive
Distribution Shape
-
Skewness:
- >0: Right-tailed
- <0: Left-tailed
- 0: Symmetric
-
Kurtosis (Excess):
- >0: Heavy-tailed (more outliers than normal)
- <0: Light-tailed
Practical Examples
Example 1: MPG from mtcars
describe(mtcars$mpg)
Output Interpretation:
vars n mean sd median trimmed mad min max range skew kurtosis se 1 1 32 20.09 6.03 19.2 19.70 5.41 10.4 33.9 23.5 0.61 -0.37 1.07
- Right-skewed (mean > median, positive skew)
- Light-tailed (negative kurtosis)
- SD (6.03) > MAD (5.41): Suggests some outlier influence
When to Use Which Statistic
Scenario Recommended Statistics Normal Distribution Mean, SD Skewed Data Median, IQR, MAD Outlier Detection MAD, trimmed mean, kurtosis Parametric Testing Mean, SE Nonparametric Analysis Median, IQRExtending the Functionality
Adding IQR
The default describe() doesn’t show IQR, but you can add it:
library(dplyr) describe(mtcars) %>% mutate(IQR = apply(mtcars, 2, IQR, na.rm = TRUE))
Comparing Groups
Use describeBy() for grouped statistics:
describeBy(mtcars$mpg, group = mtcars$cyl)
Conclusion
R’s describe() function provides a powerful starting point for exploratory data analysis. By understanding each statistic it provides, you can:
- Detect data quality issues
- Choose appropriate analysis methods
- Understand your variables’ distributions
- Make informed decisions about data transformations
For formal reporting, consider supplementing these metrics with visualization and statistical tests.
Pro Tip: Always visualize your data alongside these statistics – numbers tell part of the story, but plots reveal the full picture!
Happy coding! — Reference: Revelle, W. (2023). psych: Procedures for Psychological, Psychometric, and Personality Research. Northwestern University.
Understanding R’s `describe()` Function: A Complete Guide to Summary Statistics was first posted on April 29, 2026 at 6:09 am.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.