R Tip: Introduce Indices to Avoid for() Class Loss Issues
Win-Vector Blog 2018-03-08
Here is an R tip. Use loop indices to avoid for()
-loops damaging classes.
Below is an R annoyance that occurs again and again: vectors lose class attributes when you iterate over them in a for()
-loop.
d <- c(Sys.time(), Sys.time()) print(d) #> [1] "2018-02-18 10:16:16 PST" "2018-02-18 10:16:16 PST" for(di in d) { print(di) } #> [1] 1518977777 #> [1] 1518977777
Notice we printed numbers, not dates/times. To avoid this problem introduce an index, and loop over that, not over the vector contents.
for(ii in seq_along(d)) { di <- d[[ii]] print(di) } #> [1] "2018-02-18 10:16:16 PST" #> [1] "2018-02-18 10:16:16 PST"
seq_along()
is a handy function similar to what we discussed in R Tip: Use seq_len()
to Avoid The Backwards List Trap.
The introduction of indices is ugly, as index-free iteration is generally superior. Also, as we have mentioned before, for
-loops should not be considered anathema in R
– they are a useful tool when used correctly.
Note base::ifelse()
also loses class attributes, though dplyr::if_else()
avoids the problem. Also base::lapply()
and base::vapply()
do not have the problem (for example try: vapply(d, as.character, character(1))
and lapply(d, class)
).
In both cases R
is treating a vector of numbers as a complex class by adding a class
attr
to the vector. This means the vector is a single object holding multiple times, not a list of individual time objects. Any subsetting that strips attr
values loses the class information and the derived vector reverts to its underlying type (in this case double
).
For pre-allocation ideas (an important compliment to for
-loops) please see R Tip: Use vector()
to Pre-Allocate Lists (also includes some discussion of for
-loops).