Working with Ordinal Ranks in {marginaleffects}
R-bloggers 2025-05-02
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Given an ordinal regression model, it is relatively easy to get class-wise predictions – the conditional predicted probability of each level of the outcome. However, often, one might be interested in summarizing effects not on the class-wise probability scale (nor on the latent scale), but instead on the rank scale – the expected ordinal level, expressed as a single number.
The {emmeans}
package has the mode = "mean.class"
that does just this.
Let’s see how we can do this in {marginaleffects}
by utilizing the powerful hypothesis=
argument.
Fit a model
library(tidyverse)library(marginaleffects)bfi <- psych::bfi |> mutate( gender = factor(gender), A1 = ordered(A1) ) |> drop_na(A1, age, gender)nrow(bfi)#> [1] 2784fit <- MASS::polr(A1 ~ age + gender, data = bfi, Hess = TRUE)
Class-Wise Predictions
pr1 <- predictions(fit, variables = "gender")pr1#> #> Group Estimate Std. Error z Pr(>|z|) S 2.5 % 97.5 %#> 1 0.1800 0.01136 15.85 <0.001 185.5 0.15777 0.2023#> 1 0.1886 0.01128 16.71 <0.001 205.9 0.16649 0.2107#> 1 0.1843 0.01132 16.28 <0.001 195.6 0.16210 0.2065#> 1 0.1843 0.01132 16.28 <0.001 195.6 0.16210 0.2065#> 1 0.1843 0.01132 16.28 <0.001 195.6 0.16210 0.2065#> --- 33398 rows omitted. See ?print.marginaleffects --- #> 6 0.0289 0.00335 8.64 <0.001 57.3 0.02234 0.0355#> 6 0.0231 0.00264 8.78 <0.001 59.1 0.01798 0.0283#> 6 0.0219 0.00250 8.76 <0.001 58.8 0.01699 0.0268#> 6 0.0207 0.00238 8.71 <0.001 58.1 0.01604 0.0254#> 6 0.0121 0.00165 7.35 <0.001 42.2 0.00891 0.0154#> Type: probsnrow(pr1) # original data * 2 (counterfactual gender) * 6 (levels of A1)#> [1] 33408
For each observation in the original data frame we now have 12 predictions: - 2 counterfactual predictions for the two levels of gender, times - 6 (arguably also counterfactual) predictions for each of the possible outcome levels.
Sum-scores (mean rank)
We can average the class ranks by their predicted probability to obtain “sum scores”:
And we can do this for each row in the counterfactual data frame (marked by the
rowidcf
variable):
sum_score <- function(x) { x |> arrange(rowidcf, group, gender) |> # for each counterfactual row and level of gender summarise( term = "sum score", estimate = sum(estimate * (1:6)), .by = c(rowidcf, gender) )}pr2 <- predictions(fit, variables = "gender", hypothesis = sum_score)pr2#> #> Estimate Std. Error z Pr(>|z|) S 2.5 % 97.5 %#> 3.02 0.0583 51.8 <0.001 Inf 2.90 3.13#> 2.51 0.0451 55.6 <0.001 Inf 2.42 2.60#> 2.97 0.0553 53.7 <0.001 Inf 2.86 3.08#> 2.46 0.0415 59.4 <0.001 Inf 2.38 2.55#> 3.00 0.0568 52.8 <0.001 Inf 2.88 3.11#> --- 5558 rows omitted. See ?print.marginaleffects --- #> 2.24 0.0305 73.4 <0.001 Inf 2.18 2.30#> 2.67 0.0473 56.4 <0.001 Inf 2.57 2.76#> 2.20 0.0304 72.2 <0.001 Inf 2.14 2.26#> 2.26 0.0645 35.0 <0.001 890.8 2.13 2.38#> 1.85 0.0457 40.6 <0.001 Inf 1.77 1.94#> Term: sum score#> Type: probsnrow(pr2) # original data * 2 (counterfactual gender)#> [1] 5568
As expected, we now have 2 predictions for each observation in the original data frame: - 2 counterfactual predictions for the two levels of gender
Note that we could have also computed the mean rank for the two levels of gender, or literally anything else. But here I am trying to mimic the basic behavior of the avg_/comparisons()
functions by keeping with the workflow:
- Compute observation wise counterfactual predictions
- Contrast them for each observation
- (Possibly) average them
So let’s get those contrasts!
Contrast the ranks
sum_score.contr <- function(x) { # same as before, and... sum_score(x) |> # compute a single contrast for each counterfactual row summarise( term = "gender (2-1)", estimate = estimate[gender == "2"] - estimate[gender == "1"], .by = rowidcf )}
This function will compute a difference between the rank for each gender for each observation.
pr3 <- predictions(fit, variables = "gender", hypothesis = sum_score.contr)pr3#> #> Estimate Std. Error z Pr(>|z|) S 2.5 % 97.5 %#> -0.511 0.0588 -8.68 <0.001 57.8 -0.626 -0.395#> -0.506 0.0584 -8.67 <0.001 57.7 -0.621 -0.392#> -0.509 0.0586 -8.68 <0.001 57.8 -0.623 -0.394#> -0.509 0.0586 -8.68 <0.001 57.8 -0.623 -0.394#> -0.509 0.0586 -8.68 <0.001 57.8 -0.623 -0.394#> --- 2774 rows omitted. See ?print.marginaleffects --- #> -0.504 0.0582 -8.67 <0.001 57.6 -0.618 -0.390#> -0.483 0.0560 -8.62 <0.001 57.1 -0.593 -0.373#> -0.477 0.0554 -8.61 <0.001 56.9 -0.586 -0.368#> -0.471 0.0548 -8.59 <0.001 56.7 -0.578 -0.363#> -0.403 0.0488 -8.26 <0.001 52.6 -0.499 -0.307#> Term: gender (2-1)#> Type: probsnrow(pr3) # original data#> [1] 2784
Average Contrast
Finally, we can get the average difference:
# average contrast avg_sum_score.contr <- function(x) { sum_score.contr(x) |> summarise( term = "avg. gender (2-1)", estimate = mean(estimate) )}predictions(fit, variables = "gender", hypothesis = avg_sum_score.contr)#> #> Estimate Std. Error z Pr(>|z|) S 2.5 % 97.5 %#> -0.474 0.0551 -8.61 <0.001 56.9 -0.582 -0.366#> #> Term: avg. gender (2-1)#> Type: probs
POMP
POMP (percent of maximum possible) are a for of standardized units for Likert-type items (see Solomon Kurz’s excellent blog post for more – better – info).
These are linear transformations of the ranks described above:
We can obtain POMPs for each row in the counterfactual data frame by further processing the sum-scores:
POMP <- function(x) { sum_score(x) |> mutate( estimate = 100 * (estimate - 1) / (6 - 1) )}predictions(fit, variables = "gender", hypothesis = POMP)#> #> Estimate Std. Error z Pr(>|z|) S 2.5 % 97.5 %#> 40.4 1.166 34.6 <0.001 870.6 38.1 42.7#> 30.2 0.902 33.5 <0.001 812.8 28.4 31.9#> 39.4 1.107 35.6 <0.001 920.9 37.3 41.6#> 29.3 0.829 35.3 <0.001 905.5 27.7 30.9#> 39.9 1.135 35.1 <0.001 896.3 37.7 42.1#> --- 5558 rows omitted. See ?print.marginaleffects --- #> 24.7 0.609 40.6 <0.001 Inf 23.5 25.9#> 33.3 0.945 35.3 <0.001 902.9 31.5 35.2#> 23.9 0.608 39.3 <0.001 Inf 22.7 25.1#> 25.2 1.289 19.5 <0.001 279.4 22.6 27.7#> 17.1 0.914 18.7 <0.001 256.9 15.3 18.9#> Term: sum score#> Type: probs
And likewise get contrasts -
POMP.contr <- function(x) { POMP(x) |> summarise( term = "gender (2-1)", estimate = estimate[gender == "2"] - estimate[gender == "1"], .by = rowidcf )}predictions(fit, variables = "gender", hypothesis = sum_score.contr)#> #> Estimate Std. Error z Pr(>|z|) S 2.5 % 97.5 %#> -0.511 0.0588 -8.68 <0.001 57.8 -0.626 -0.395#> -0.506 0.0584 -8.67 <0.001 57.7 -0.621 -0.392#> -0.509 0.0586 -8.68 <0.001 57.8 -0.623 -0.394#> -0.509 0.0586 -8.68 <0.001 57.8 -0.623 -0.394#> -0.509 0.0586 -8.68 <0.001 57.8 -0.623 -0.394#> --- 2774 rows omitted. See ?print.marginaleffects --- #> -0.504 0.0582 -8.67 <0.001 57.6 -0.618 -0.390#> -0.483 0.0560 -8.62 <0.001 57.1 -0.593 -0.373#> -0.477 0.0554 -8.61 <0.001 56.9 -0.586 -0.368#> -0.471 0.0548 -8.59 <0.001 56.7 -0.578 -0.363#> -0.403 0.0488 -8.26 <0.001 52.6 -0.499 -0.307#> Term: gender (2-1)#> Type: probs
And average contrasts -
# average contrast avg_POMP.contr <- function(x) { POMP.contr(x) |> summarise( term = "avg. gender (2-1)", estimate = mean(estimate) )}predictions(fit, variables = "gender", hypothesis = avg_POMP.contr)#> #> Estimate Std. Error z Pr(>|z|) S 2.5 % 97.5 %#> -9.49 1.1 -8.61 <0.001 56.9 -11.6 -7.33#> #> Term: avg. gender (2-1)#> Type: probs
A note for Bayesians
Hey, Bayesians, how are you doing?
I have some bad news: {marginaleffects}
does not support hypothesis=<function>
.
But I also have some good news! You can basically do all of this by getting the posterior draws in an rvar
format, and then directly manipulating the posterior(s) - so the examples above should basically work out of the box.
For example:
library(brms)library(posterior)fit_b <- brm(A1 ~ age + gender, data = bfi, family = cumulative(), prior = NULL) # obviously this is a bad priorpr1_b <- predictions(fit_b, variables = "gender")
We just need to adapt the function(s) written above to work properly with rvar
s. I’ve marked
avg_POMP.contr_b <- function(x) { x |> arrange(rowidcf, group, gender) |> # for each counterfactual row and level of gender summarise( term = "sum score", # estimate = sum(estimate * (1:6)), 1 estimate = rvar_sum(rvar * (1:6)), .by = c(rowidcf, gender) ) |> # POMP mutate( estimate = 100 * (estimate - 1) / (6 - 1) ) |> # contrasts summarise( term = "gender (2-1)", estimate = estimate[gender == "2"] - estimate[gender == "1"], .by = rowidcf ) |> # average summarise( term = "avg. gender (2-1)", # estimate = mean(estimate)2 estimate = rvar_mean(estimate) )}get_draws(pr1_b, shape = "rvar") |> avg_POMP.contr_b()#> term estimate#> 1 avg. gender (2-1) -9.4 ± 1.1
- 1
- Use the
rvar
column and thervar_sum()
function (instead ofsum()
) - 2
- Use the
rvar_mean()
function (instead ofmean()
)
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.