Function Generators vs Partial Application in R

R-bloggers 2025-04-25

[This article was first published on rstats on Irregularly Scheduled Programming, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

In which I confront the way I read code in different languages, and end upwishing that R had a feature that it doesn’t.

This is a bit of a thought-dump as I consider some code – please don’t take itas a criticism of any design choices; the tidyverse team have written magnitudesmore code that I have and have certainly considered their approach more than Iwill. I believe it’s useful to challenge our own assumptions and dig in to howwe react to reading code.

The blog postdescribing the latest updates to the tidyverse {scales} package neatlydemonstrates the usage of the new functionality, but because the examples arewritten outside of actual plotting code, one feature stuck out to me inparticular…

label_glue("The {x} penguin")(c("Gentoo", "Chinstrap", "Adelie"))# The Gentoo penguin# The Chinstrap penguin# The Adelie penguin

Here, label_glue is a function that takes a {glue} string as an argument andreturns a ’labelling” function’. That function is then passed the vector ofpenguin species, which is used in the {glue} string to produce the output.

Note

For those coming to this post from a python background, {glue} is R’sanswer to f-strings, and is used in almost the exact same way for simple cases:

  ## R:  name <- "Jonathan"  glue::glue("My name is {name}")  # My name is Jonathan  ## Python:  >>> name = 'Jonathan'  >>> f"My name is {name}"  # 'My name is Jonathan'

There’s nothing magic going on with the label_glue()() call – functions arebeing applied to arguments – but it’s always useful to interrogate surprise whenreading some code.

Spelling out an example might be a bit clearer. A simplified version oflabel_glue might look like this

tmp_label_glue <- function(pattern = "{x}") {  function(x) {    glue::glue_data(list(x = x), pattern)  }}

This returns a function which takes one argument, so if we evaluate it we get

tmp_label_glue("The {x} penguin")# function(x) {#   glue::glue_data(list(x = x), pattern)# }# <environment: 0x1137a72a8>

This has the benefit that we can store this result as a new named function

penguin_label <- tmp_label_glue("The {x} penguin")penguin_label# function(x) {#    glue::glue_data(list(x = x), pattern)# }# <bytecode: 0x113914e48># <environment: 0x113ed4000>penguin_label(c("Gentoo", "Chinstrap", "Adelie"))# The Gentoo penguin# The Chinstrap penguin# The Adelie penguin

This is versatile, because different {glue} strings can produce differentfunctions - it’s a function generator. That’s neat if you want differentfunctions, but if you’re only working with that one pattern, it can seem odd tocall it inline without naming it, as the earlier example

label_glue("The {x} penguin")(c("Gentoo", "Chinstrap", "Adelie"))

It looks like we should be able to have all of these arguments in the samefunction

label_glue("The {x} penguin", c("Gentoo", "Chinstrap", "Adelie"))

but apart from the fact that label_glue doesn’t take the labels as anargument, that doesn’t return a function, and the place where this will be usedtakes a function as the argument.

So, why do the functions from {scales} take functions as arguments? The reasonwould seem to be that this enables them to work lazilly - we don’t necessarilyknow the values we want to pass to the generated function at the call site;maybe those are computed as part of the plotting process.

We also don’t want to have to extract these labels out ourselves and compute onthem; it’s convenient to let the scale_* function do that for us, if we justprovide a function for it to use when the time is right.

But what is passed to that generated function? That depends on where it’sused… if I used it in scale_y_discrete then it might look like this

library(ggplot2)library(palmerpenguins)p <- ggplot(penguins[complete.cases(penguins), ]) +   aes(bill_length_mm, species) +   geom_point() p + scale_y_discrete(labels = penguin_label)

since the labels argument takes a function, and penguin_label is a functioncreated above.

I could equivalently write that as

p + scale_y_discrete(labels = label_glue("The {x} penguin"))

and not need the “temporary” function variable.

So what gets passed in here? That’s a bit hard to dig out of the source, but onecould reasonably expect that at some point the supplied function will be calledwith the available labels as an argument.

I have a suspicion that the “external” use of this function, as

label_glue("The {x} penguin")(c("Gentoo", "Chinstrap", "Adelie"))

is clashing with my (much more recent) understanding of Haskell and the way thatpartial application works. In Haskell, all functions take exactly 1 argument,even if they look like they take more. This function

ghci> do_thing x y z = x + y + z

looks like it takes 3 arguments, and it looks like you can use it that way

ghci> do_thing 2 3 49

but really, each “layer” of arguments is a function with 1 argument, i.e. anhonest R equivalent would be

do_thing <- function(x) {  function(y) {    function(z) {      x + y + z    }  }}do_thing(2)(3)(4)# [1] 9

What’s important here is that we can “peel off” some of the layers, and we getback a function that takes the remaining argument(s)

do_thing(2)(3)# function(z) {#    x + y + z# }# <bytecode: 0x116b72ba0># <environment: 0x116ab2778>partial <- do_thing(2)(3)partial(4)# [1] 9

In Haskell, that looks like this

ghci> partial = do_thing 2 3ghci> partial 49

Requesting the type signature of this function shows

ghci> :type do_thingdo_thing :: Num a => a -> a -> a -> a

so it’s a function that takes some value of type a (which needs to be a Numbecause we’re using + for addition; this is inferred by the compiler) and thenwe have

a -> a -> a -> a

This can be read as “a function that takes 3 values of a type a and returns 1value of that same type” but equivalently (literally; this is all just syntacticsugar) we can write it as

a -> (a -> (a -> a))

which is “takes a value of type a and returns a function that takes a value oftype a, which itself returns a function that takes a value of type a andreturns a value of type a”. With a bit of ASCII art…

a -> (a -> (a -> a))|     |     |    ||     |     |_z__||     |_y________||_x______________|

If we ask for the type signature when some of the arguments are provided

ghci> :type do_thing 2 3do_thing 2 3 :: Num a => a -> a

we see that now it is a function of a single variable (a -> a).

With that in mind, the labelling functions look like a great candidate forpartially applied functions! If we had

label_glue(pattern, labels)

then

label_glue(pattern)

would be a function “waiting” for a labels argument. Isn’t that the same aswhat we have? Almost, but not quite. label_glue doesn’t take a labelsargument, it returns a function which will use them, so the lack of the labelsargument isn’t a signal for this. label_glue(pattern) still returns afunction, but that’s not obvious, especially when used inline as

scale_y_discrete(labels = label_glue("The {x} penguin"))

When I read R code like that I see the parentheses at the end of label_glueand read it as “this is a function invocation; the return value will be usedhere”. That’s correct, but in this case the return value is another function.There’s nothing here that says “this will return a function”. There’s noconvention in R for signalling this (and being dynamically typed, all one can dois read the documentation) but one could imagine one, e.g. label_glue_F in asimilar fashion to how Julia uses an exclamation mark to signify an in-placemutating function; sort! vs sort.

Passing around functions is all the rage in functional programming, and it’s howyou can do things like this

sapply(mtcars[, 1:4], mean)#      mpg       cyl      disp        hp # 20.09062   6.18750 230.72188 146.68750

Here I’m passing a list (the first four columns of the mtcars dataset) and afunction (mean, by name) to sapply which essentially does a map(l, f)and produces the mean of each of these columns, returning a named vector of themeans.

That becomes very powerful where partial application is allowed, enabling thingslike

ghci> add_5 = (+5)ghci> map [1..10] add_5[6,7,8,9,10,11,12,13,14,15]

In R, we would need to create a new function more explicitly, i.e. referring toan arbitrary argument

add_5 <- \(x) x + 5sapply(1:10, add_5)# [1]  6  7  8  9 10 11 12 13 14 15

Maybe my pattern-recognition has become a bit too overfitted on the idea that inR “no parentheses = function, not result; parentheses = result”.

This reads weirdly to me

calc_mean <- function() {  function(x) {    mean(x)  }}sapply(mtcars[, 1:4], calc_mean())

but it’s exactly the same as the earlier example, since calc_mean()essentially returns a mean function

calc_mean()(1:10)[1] 5.5

For that reason, I like the idea of naming the labelling function, since I readthis

p + scale_y_discrete(labels = penguin_label)

as passing a function. The parentheses get used in the right place - where thefunction has been called.

Now, having to define that variable just to use it in the scale_y_discretecall is probably a bit much, so yeah, inlining it makes sense, with the caveatthat you have to know it’s a function.

None of this was meant to say that the {scales} approach is wrong in any way - Ijust wanted to address my own perceptions of the arg = fun() design. It doesmake sense, but it looks different. Am I alone on this?

Let me know on Mastodon and/or the commentsection below.

devtools::session_info() ```{r sessionInfo, echo = FALSE}devtools::session_info()```

To leave a comment for the author, please follow the link and comment on their blog: rstats on Irregularly Scheduled Programming.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Continue reading: Function Generators vs Partial Application in R