Understanding R’s `describe()` Function: A Complete Guide to Summary Statistics

R-bloggers 2026-04-30

[This article was first published on R-posts.com, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Understanding R’s describe() Function: A Complete Guide to Summary Statistics

Introduction to `describe()`

The describe() function from R’s psych package (Revelle, 2023) provides a comprehensive statistical summary of your dataset. Unlike R’s base summary() function, it includes additional metrics that are particularly useful for data exploration and assumption checking.

library(psych)
describe(your_data)

Breaking Down the Output Columns

Here’s what each column in the output represents:

Column Description Formula/Calculation Ideal Use Case vars Variable index number – Tracking variable order n Complete cases length(na.omit(x)) Data completeness check mean Arithmetic average sum(x)/n Normally distributed data sd Standard deviation sqrt(var(x)) Measuring spread median 50th percentile quantile(x, 0.5) Skewed distributions trimmed Mean after removing extremes mean(x, trim=0.1) Robust central tendency mad Median absolute deviation median(abs(x-median(x))) Outlier-resistant spread min Minimum value min(x) Range assessment max Maximum value max(x) Range assessment range Max – Min max(x)-min(x) Total spread skew Distribution asymmetry sum((x-mean(x))³)/(n*sd(x)³) Detecting skew direction kurtosis Tailedness sum((x-mean(x))⁴)/(n*sd(x)⁴)-3 Outlier propensity se Standard error sd(x)/sqrt(n) Precision of mean estimate

Key Statistics and Their Interpretation

Central Tendency

Mean vs. Median: Differences indicate skewness
Trimmed Mean: Removes influence of outliers (default drops top/bottom 10%)

Variability

SD vs. MAD: Use MAD when outliers are present
Range: Simple but outlier-sensitive

Distribution Shape

Skewness:
- >0: Right-tailed
- <0: Left-tailed
- 0: Symmetric
Kurtosis (Excess):
- >0: Heavy-tailed (more outliers than normal)
- <0: Light-tailed

Practical Examples

Example 1: MPG from mtcars

describe(mtcars$mpg)

Output Interpretation:

   vars  n   mean    sd median trimmed   mad min  max range skew kurtosis   se
1     1 32 20.09 6.03   19.2   19.70 5.41 10.4 33.9  23.5 0.61    -0.37 1.07

Right-skewed (mean > median, positive skew)
Light-tailed (negative kurtosis)
SD (6.03) > MAD (5.41): Suggests some outlier influence

When to Use Which Statistic

Scenario Recommended Statistics Normal Distribution Mean, SD Skewed Data Median, IQR, MAD Outlier Detection MAD, trimmed mean, kurtosis Parametric Testing Mean, SE Nonparametric Analysis Median, IQR

Extending the Functionality

Adding IQR

The default describe() doesn’t show IQR, but you can add it:

library(dplyr)
describe(mtcars) %>% 
  mutate(IQR = apply(mtcars, 2, IQR, na.rm = TRUE))

Comparing Groups

Use describeBy() for grouped statistics:

describeBy(mtcars$mpg, group = mtcars$cyl)

Conclusion

R’s describe() function provides a powerful starting point for exploratory data analysis. By understanding each statistic it provides, you can:

Detect data quality issues
Choose appropriate analysis methods
Understand your variables’ distributions
Make informed decisions about data transformations

For formal reporting, consider supplementing these metrics with visualization and statistical tests.

Pro Tip: Always visualize your data alongside these statistics – numbers tell part of the story, but plots reveal the full picture!

Happy coding! — Reference: Revelle, W. (2023). psych: Procedures for Psychological, Psychometric, and Personality Research. Northwestern University.

Understanding R’s `describe()` Function: A Complete Guide to Summary Statistics was first posted on April 29, 2026 at 6:09 am.

To leave a comment for the author, please follow the link and comment on their blog: R-posts.com.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.