checkglobals: an(other) R-package for static code analysis

R-bloggers 2025-03-25

[This article was first published on Open Analytics, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Introduction

An important aspect of writing an R-script or an R-package is ensuring reproducibility andmaintainability of the developed code, not only for others, but also for our future selves. Themodern R ecosystem provides various tools and packages to help organize and validate written R code.Some widely used packages include roxygen2 (for function documentation), renv (for dependencymanagement and environment isolation), and testthat, tinytest and Runit for unit testing[1].

When it comes to package development, it is good practice to run R CMD check to perform a seriesof automated checks identifying possible issues with the R-package. Among the checks performed byR CMD check is a static inspection of the internal syntax trees of the code through the use of thecodetools package. This code analysis discoversundefined functions and variables without executing the code itself, leading to the following(perhaps familiar) notifications:

❯ checking R code for possible problems ... NOTEmy_fun: no visible binding for global variable ‘g’

The undefined global variables returned by R CMD check may be false positives caused by functionsthat use data-masking or non-standard evaluation, such as subset(), transform() or with(). Inthese cases, a common solution is to suppress the notifications by including the variable namesinside a call to utils::globalVariables().

Most importantly, we wish to detect variable names that are truly undefined as soon as possible, asthese could point to a mistake in the code or signal a missing function or package import.

In this context, this post introduces a minimal R-package checkglobals aimed at serving as anefficient alternative to the static code analysis provided by codetools to check R-packages andR-scripts for missing function imports and variable names on-the-fly. The code inspection proceduresare implemented using R’s internal C API for efficiency, and no external R-package dependencies arestrictly required, (only cli andknitr are suggested for interactive use and checking Rmddocuments respectively).

Example usage

The checkglobals-package contains a single wrapper function checkglobals() to inspect R-scripts,Rmd-documents, folders, R-code strings or R-packages. As an example, consider the following R-scriptcontaining a demo Shiny application (source:https://raw.githubusercontent.com/rstudio/shiny-examples/main/004-mpg/app.R).

# scripts/app.Rlibrary(shiny)library(datasets)# Data pre-processing ----mpgData <- mtcarsmpgData$am <- factor(mpgData$am, labels = c("Automatic", "Manual"))# Define UI for miles per gallon app ----ui <- fluidPage(titlePanel("Miles Per Gallon"),sidebarLayout(sidebarPanel(selectInput("variable", "Variable:",c("Cylinders" = "cyl","Transmission" = "am","Gears" = "gear")),checkboxInput("outliers", "Show outliers", TRUE)),mainPanel(h3(textOutput("caption")),plotOutput("mpgPlot"))))# Define server logic to plot various variables against mpg ----server <- function(input, output) {formulaText <- reactive({paste("mpg ~", input$variable)})output$caption <- renderText({formulaText()})output$mpgPlot <- renderPlot({boxplot(as.formula(formulaText()),data = mpgData,outline = input$outliers,col = "#75AADB", pch = 19)})}# Create Shiny app ----shinyApp(ui, server)

Calling checkglobals() with the argument file on the R-script saved as a local file returns asoutput:

Looking at the printed output of the object returned by checkglobals(), it lists the followinginformation:

  1. the name and location of all unrecognized global variables;
  2. the name and location of all detected imported functions grouped by R-package.

The location app.R#36 lists the R-file name (app.R) and line number (36) of the detectedvariable or function. If cli is installed andcli-hyperlinks are supported, clicking the location links opens the source file pointing to thegiven line number. The bars and counts behind the imported package names highlight the number offunction calls detected from each package.

More detailed information can be obtained by calling print() directly. For instance, we can printthe referenced source code lines of the unrecognized global variables with:

The detection of imported functions and packages is an important motivation for thecheckglobals-package. First, this allows us to validate the NAMESPACE file of a developmentR-package or check R-scripts for any additional packages that require installation before executionof the code. Second, this information can be used to get a better sense of the importance of animported package, for instance to determine how much effort it would take to remove or replace it asa dependency. This is different from e.g. the codetools package, where findGlobals() orcheckUsage() return an undefined variable name if a function import is not recognized, but do notreturn variable names that have been recognized as imports. The same is true for the conveniencepackages lintr (with object_usage_linter()) orglobals which provide codetools wrappers producingsimilar results as returned by R CMD check. More similar is renv::dependencies(), which scansfor all loaded and/or imported packages in an R project folder by analyzing the DESCRIPTION andNAMESPACE files of an R-package or by detecting calls to library(), require(), etc. in anR-script. Note that renv::dependencies() returns package names, but not the functions called fromthese packages.

An additional benefit of a minimal and efficient code analysis package is that we can significantlyreduce the runtime required to inspect large R-packages or codebases allowing to quickly check thecode interactively during development:

## absolute timings (seconds) for inspecting the shiny package## (100-fold relative time difference)bench::mark(lint_package = lint_package("~/git/shiny", linters = list(object_usage_linter())),checkglobals = checkglobals(pkg = "~/git/shiny/"),iterations = 10,check = FALSE,time_unit = "s")#> # A tibble: 2 × 6#> expression min median `itr/sec` mem_alloc `gc/sec`#> <bch:expr> <dbl> <dbl> <dbl> <bch:byt> <dbl>#> 1 lint_package 18.8 19.5 0.0508 1.33GB 2.42#> 2 checkglobals 0.157 0.162 5.96 15.69MB 1.19

More examples

R Markdown files

The file argument also accepts R Markdown (.Rmd or .Rmarkdown) file locations. For R Markdownfiles, the R code chunks are first extracted into a temporary R-script with knitr::purl(), whichis then analyzed by checkglobals(). Instead of a local file, the file argument incheckglobals() can also be a remote file location (e.g. a server or the web), in which case theremote file is first downloaded as a temporary file with download.file(). Below, we scan one oftidyr’s package vignettes (source:https://raw.githubusercontent.com/tidyverse/tidyr/main/vignettes/tidy-data.Rmd),

R-packages that are imported or loaded, but have no detected function imports are displayed with ann/a reference. This can happen when checkglobals() falsely ignores one or more importedfunctions from the given package or when the package is not actually needed as a dependency. In bothcases this is useful information to have. In the above example, tibble is loaded in order to usetribble(), but the tribble() function is also exported by dplyr, so it shows up under thedplyr imports instead.

Folders

Folders containing R-scripts can be scanned with the dir argument, which inspects all R-scriptspresent in dir (and any of its subdirectories). The following example scans an R-Shiny app foldercontaining a ui.R and server.R file (source:https://github.com/rstudio/shiny-examples/tree/main/018-datatable-options),

If imports are detected from an R-package not installed in the current R-session, an alert isprinted (as with the DT package above). Function calls accessing the missing R-package explicitly,using e.g. :: or :::, can still be fully identified as imported function names. Function callswith no reference to the missing R-package will be listed as unrecognized global variables.

R-packages

R-package folders can be scanned with the pkg argument. Conceptually, checkglobals() scans allfiles in the /R folder of the package and contrasts the detected (unrecognized) globals andimports against the imports listed in the NAMESPACE file of the package. R-scripts present elsewherein the package (e.g. in the /inst folder) are not analyzed, as these are not covered by thepackage NAMESPACE file. To illustrate, we can run checkglobals() on its own package folder:

Bundled packages

Besides local R-package folders, the pkg argument also accepts file paths to bundled sourceR-packages (tar.gz). This can either be a tar.gz package on the local filesystem, or a remote filelocation, such as the web (similar to the file argument).

Local filesystem:

Remote file location:

Known limitations

To conclude, we discuss some of the limitations of static code analysis with codetools andcheckglobals. When using codetools (or R CMD check) there are several scenarios where the codeinspection is known to skip undefined names that could potentially be detected. First, a variablethat requires evaluation before it is defined may be missed, as codetools does not track in whichorder assignment and evaluation happen inside a local scope. Here is a minimal example usingcodetools::findGlobals():

## findGlobals requires a function as inputtest1 <- function() {print(x)x <- 1}## calling this function generates an errortest1()#> [1] NAlibrary(codetools)## x is not recognized as an undefined## variable at the moment of evaluationfindGlobals(test1)#> [1] "{" "<-" "print"

Another quite common situation is the use of a character function name inside a functional,e.g. Reduce(), Filter(), Map() or the apply-family of functions. These function names areviewed by codetools as ordinary character strings:

test2 <- function() {do.call("foo", 1)}## foo is not recognized as an undefined## variable since it is defined as a stringfindGlobals(test2)#> [1] "{" "do.call"

Finally, more complex assignment statements may not always be handled as expected:

test3 <- function() {assign(x = "x1", value = 1)assign(value = 2, x = "x2")c(x1, x2)}## assignment to x1 is recognized correctly,## but assignment to x2 is notfindGlobals(test3)#> [1] "{" "assign" "c" "x2"x <- NAtest4 <- function() {x <<- 1x}## x is assigned in a different scope## but is available when evaluatedfindGlobals(test4)#> [1] "{" "<<-" "x"

The checkglobals-package tries to address some of these use-cases, but due to R’s flexibility as alanguage, there are a number of use-cases we can think of that are either too ambiguous or complexto be analyzed without evaluation of the code itself. Below we list some of these cases, wherecheckglobals() fails to recognize a variable name (false negative) or falsely detects a globalvariable when it should not (false positive).

Character variable/function names

## this works (character arguments are recognized as functions)checkglobals(text = 'do.call(args = list(1), what = "median")')checkglobals(text = 'Map("g", 1, n = 1)')checkglobals(text = 'stats::aggregate(x ~ ., data = y, FUN = "g")')## this doesn't work (evaluation is required)checkglobals(text = 'g <- "f"; Map(g, 1, n = 1)')checkglobals(text = "eval(substitute(g))") ## same for ~, expression, quote, bquote, Quote, etc.## this works (calling a function in an exotic way)checkglobals(text = '"head"(1:10)')checkglobals(text = '`::`("utils", "head")(1:10)')checkglobals(text = 'list("function" = utils::head)$`function`(1:10)')## this doesn't work (evaluation is required)checkglobals(text = 'get("head")(1:10)')checkglobals(text = 'methods::getMethod("f", signature = "ANY")')

Package loading

## this works (simple evaluation of package names)checkglobals(text = 'attachNamespace("utils"); head(1:10)')checkglobals(text = 'pkg <- "utils"; library(pkg, character.only = TRUE); head(1:10)')## this doesn't work (more complex evaluation is required)checkglobals(text = 'pkg <- function() "utils"; library(pkg(), character.only = TRUE); head(1:10)')checkglobals(text = 'loadPkg <- library; loadPkg(utils)')checkglobals(text = 'box::use(utils[...])')

Unknown symbols

## this works (special functions self, private, super are recognized)checkglobals(text = 'R6::R6Class("cl",public = list(initialize = function(...) self$f(...),f = function(...) private$p),private = list(p = list()))')## this doesn't work (data masking)checkglobals(text = 'transform(mtcars, mpg2 = mpg^2)')checkglobals(text = 'attach(iris); print(Sepal.Width)')

Lazy evaluation

## this works (basic lazy evaluation)checkglobals(text = '{addy <- function(y) x + yx <- 0addy(1)}')checkglobals(text = 'function() {on.exit(rm(x))x <- 0}')## this doesn't work (lazy evaluation in external functions)checkglobals(text = 'server <- function(input, output) {add1x <- shiny::reactive({add1(input$x)})add1 <- function(x) x + 1}')

Useful references

  • checkglobals, CRAN webpage of thecheckglobals package including links to additional documentation.
  • codetools::findGlobals(), detects global variables from R-scripts via static code analysis.This and other codetools functions are used in the source code checks run by R CMD check.
  • globals, R-package by H. Bengtsson providing are-implementation of the functions in codetools to identify global variables using variousstrategies for export in parallel computations.
  • renv::dependencies(), detects R-package dependencies by scanning all R-files in a project forimported functions or packages via static code analysis.
  • lintr, R-package by J. Hester and others to performgeneral static code analysis in R projects. lintr::object_usage_linter() provides a wrapper ofcodetools::checkUsage() to detect global variables similar to R CMD check.
  1. Unit testing with R CMD check does not require the use of external packages, but many packagedevelopers rely on packages such as testthat or tinytest for convenience and due to commonpractice.
To leave a comment for the author, please follow the link and comment on their blog: Open Analytics.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Continue reading: checkglobals: an(other) R-package for static code analysis