Prefer = for assignment in R
Win-Vector Blog 2013-04-24
We share our opinion that =
should be preferred to the more standard <-
for assignment in R. This is from a draft of the appendix of our upcoming book. This has the risk of becoming an R version of Javascript’s semicolon controversy, but here you have it.
R has five common assignment operators: “=
“, “<-
“, “->
“, “<<-
” and “->>
“. Traditionally in R <-
is the preferred assignment operator and =
is thought as an amateurish alias for it.
The <-
notation is preferred by some for the very good reason that <-
always means assignment. Whereas =
can mean assignment, function argument binding or case statement depending on context. However, in our opinion, you are allowed by R to type <-
too many places (such as inside expressions) and it usually an easier to find bug when you typed =
when you meant <-
than the other way around.
We prefer to get into the habit of never typing <-
, because accidentally typing <-
instead of =
in a function call can cause a non-reported error. Consider the following code fragment demonstrating how we can use =
to bind values to function arguments:
> divide = function(numerator,denominator) { numerator/denominator }> divide(1,2)[1] 0.5> divide(2,1)[1] 2> divide(denominator=2,numerator=1)[1] 0.5
Now consider the following (deliberate) error, by habit we typed <-
instead of =
:
> divide(denominator<-2,numerator<-1)[1] 2> denominator[1] 2
We quietly get the wrong answer and contaminate the values of numerator
and denominator
in the global name space. This is a simple example of where typing <-
where =
was intended causes a non-signaling bug. We don’t know of any simple example (other than building examples that intend side-effects) where typing =
where you meant <-
is an error. So we prefer =
.
The ->
operator is just a right to left assignment that lets you write things like x -> 5
. It is cute, but not game changing. The <<-
and ->>
are to be avoided unless you actually need their special abilities. They undo one of the important safety point about functions. When a variable is assigned inside a function this assignment is local to the function. That is nobody outside of the function every sees the effect, the function can safely use variables to store intermediate calculations without clobbering same-named outside variables. The <<-
and ->>
operators are the operators to reach outside of this protected scope and cause outside side effects. Side effects seem great when you need them, but on the balance they make code maintenance, debugging and documentation much harder.
Related posts: