Please Avoid detectCores() in your R Packages

R-bloggers 2022-12-06

[This article was first published on JottR on R, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

The detectCores() function of the parallel package is probablyone of the most used functions when it comes to setting the number ofparallel workers to use in R. In this blog post, I’ll try to explainwhy using it is not always a good idea. Already now, I am going tomake a bold request and ask you to:

Please avoid using parallel::detectCores() in your package!

By reading this blog post, I hope you become more aware of thedifferent problems that arise from using detectCores() and how theymight affect you and your users of your code.

Screenshots of two terminal-based, colored graphs each showing near 100% load on all 24 CPU cores. The load bars to the left are mostly red, whereas the ones to the right are most green. There is a shrug emoji, with the text \Figure 1: Using detectCores() risks overloading themachine where R runs, even more so if there are other things alreadyrunning. The machine seen at the left is heavily loaded, because toomany parallel processes compete for the 24 CPU cores available, whichresults in an extensive amount of kernel context switching (red),which wastes precious CPU cycles. The machine to the right isnear-perfectly loaded at 100%, where none of the processes use morethan they may use (mostly green).

TL;DR

If you don’t have time to read everything, but will take my word thatwe should avoid detectCores(), then the quick summary is that youbasically have two choices for the number of parallel workers to useby default;

  1. Have your code run with a single core by default(i.e. sequentially), or

  2. replace all parallel::detectCores() withparallelly::availableCores().

I’m in the conservative camp and recommend the first alternative.Using sequential processing by default, where the user has to make anexplicit choice to run in parallel, significantly lowers the risk forclogging up the CPUs (left panel in Figure 1), especially whenthere are other things running on the same machine.

The second alternative is useful if you’re not ready to make the moveto run sequentially by default. The availableCores() function ofthe parallelly package is fully backward compatible withdetectCores(), while it avoids the most common problems that comeswith detectCores(), plus it is agile to a lot more CPU-relatedsettings, including settings that the end-user, the systemsadministrator, job schedulers and Linux containers control. It isdesigned to take care of common overuse issues so that you do not haveto spend time worry about them.

Background

There are several problems with using detectCores() from theparallel package for deciding how many parallel workers to use.But before we get there, I want you to know that we find this functioncommonly used in R script and R packages, and frequently suggested intutorials. So, do not feel ashamed if you use it.

If we scan the code of the R packages on CRAN (e.g. by searchingGitHub1), or on Bioconductor (e.g. by searchingBioc::CodeSearch) we find many cases where detectCores() is used.Here are some variants we see in the wild:

cl <- makeCluster(detectCores())cl <- makeCluster(detectCores() - 1)y <- mclapply(..., mc.cores = detectCores())registerDoParallel(detectCores())

We also find functions that let the user choose the number of workersvia some argument, which defaults to detectCores(). Sometimes thedefault is explicit, as in:

fast_fcn <- function(x, ncores = parallel::detectCores()) {  if (ncores > 1) {    cl <- makeCluster(ncores)    ...  }}

and sometimes it’s implicit, as in:

fast_fcn <- function(x, ncores = NULL) {  if (is.null(ncores))     ncores <- parallel::detectCores() - 1  if (ncores > 1) {    cl <- makeCluster(ncores)    ...  }}

As we will see next, all the above examples are potentially buggy andmight result in run-time errors.

Common mistakes when using detectCores()

Issue 1: detectCores() may return a missing value

A small, but important detail about detectCores() that is oftenmissed is the following section in help("detectCores", package ="parallel"):

Value

An integer, NA if the answer is unknown.

Because of this, we cannot rely on:

ncores <- detectCores()

to always work, i.e. we might end up with errors like:

ncores <- detectCores()workers <- parallel::makeCluster(ncores)Error in makePSOCKcluster(names = spec, ...) :   numeric 'names' must be >= 1

We need to account for this, especially as package developers. Oneway to handle it is simply by using:

ncores <- detectCores()if (is.na(ncores)) ncores <- 1L

or, by using the following shorter, but also harder to understand,one-liner:

ncores <- max(1L, detectCores(), na.rm = TRUE)

This construct is guaranteed to always return at least one core.

Shameless advertisement for the parallelly package: Incontrast to detectCores(), parallelly::availableCores() handlesthe above case automatically, and it guarantees to always return atleast one core.

Issue 2: detectCores() may return one

Although it’s rare to run into hardware with single-core CPUs thesedays, you might run into a virtual machine (VM) configured to have asingle core. Because of this, you cannot reliably use:

ncores <- detectCores() - 1L

or

ncores <- detectCores() - 2L

in your code. If you use these constructs, a user of your code mightend up with zero or a negative number of cores here, which another waywe can end up with an error downstream. A real-world example of thisproblem can be found in continous integration (CI) services,e.g. detectCores() returns 2 in GitHub Actions jobs. So, we need toaccount also for this case, which we can do by using the abovemax() solution, e.g.

ncores <- max(1L, detectCores() - 2L, na.rm = TRUE)

This is guaranteed to always return at least one.

Shameless advertisement for the parallelly package: Incontrast, parallelly::availableCores() handles this case viaargument omit, which makes it easier to understand the code, e.g.

ncores <- availableCores(omit = 2L)

This construct is guaranteed to return at least one core, e.g. ifthere are one, two, or three CPU cores on this machine, ncores willbe one in all three cases.

Issue 3: detectCores() does not give the number of “allowed” cores

There’s a note in help("detectCores", package = "parallel") thattouches on the above problems, but also on other important limitationsthat we should know of:

Note

This [= detectCores()] is not suitable for use directly for the mc.cores argument ofmclapply nor specifying the number of cores inmakeCluster. First because it may return NA, second because itdoes not give the number of allowed cores, and third because onSparc Solaris and some Windows boxes it is not reasonable to try touse all the logical CPUs at once.

When is this relevant? The answer is: Always! This is because aspackage developers, we cannot really know when this occurs, because wenever know on what type of hardware and system our code will run on.So, we have to account for these unknowns too.

Let’s look at some real-world case where using detectCores() canbecome a real issue.

3a. A personal computer

A user might want to run other software tools at the same time whilerunning the R analysis. A very common pattern we find in R code isto save one core for other purposes, say, browsing the web, e.g.

ncores <- detectCores() - 1L

This is a good start. It is the first step toward your software toolacknowledging that there might be other things running on the samemachine. However, contrary to end-users, we as package developerscannot know how many cores the user needs, or wishes, to setaside. Because of this, it is better to let the user make thisdecision.

A related scenario is when the user wants to run two concurrent Rsessions on the same machine, both using your code. If your codeassumes it can use all cores on the machine (i.e. detectCores()cores), the user will end up running the machine at 200% of itscapacity. Whenever we use over 100% of the available CPU resources,we get penalized and waste our computational cycles on overhead fromcontext switching, sub-optimal memory access, and more. This is wherewe end up with the situation illustrated in the left part ofFigure 1.

Note also that users might not know that they use an R function thatruns on all cores by default. They might not even be aware that thisis a problem. Now, imagine if the user runs three or four such Rsessions, resulting in a 300-400% CPU load. This is when things startto run slowly. The computer will be sluggish, maybe unresponsive, andmostly likely going to get very hot (“we’re frying the computer”). Bythe time the four concurrent R processes complete, the user might havebeen able to finish six to eight similar processes if they would nothave been fighting each other for the limited CPU resources.

A shared computer

In the academia and the industry, it is common that several usersshare the same compute server och set of compute nodes. It might beas simple as they SSH into a shared machine with many cores and largeamounts of memory to run their analysis there. On such setups, loadbalancing between users is often based on an honor system, where eachuser checks how many resources are available before launching ananalysis. This helps to make sure they don’t end up using too manycores, or too much memory, slowing down the computer for everyoneelse.

The left-handside graph of Figure 1, which shows mostly red bars at near 100% load for 24 CPU cores.Figure 2: Overusing the CPU cores brings everything to a halt.

Now, imagine they run a software tool that uses all CPU cores bydefault. In that case, there is a significant risk they will step onthe other users’ processes, slowing everything down for everyone,especially if there is already a big load on the machine. From myexperience in academia, this happens frequently. The user causing theproblem is often not aware, because they just launch the problematicsoftware with the default settings, leave it running, with a plan tocoming back to it a few hours or a few days later. In the meantime,other users might wonder why their command-line prompts becomesluggish or even non-responsive, and their analyses suddenly takeforever to complete. Eventually, someone or something alerts thesystems administrators to the problem, who end up having to dropeverything else and start troubleshooting. This often results in themterminating the wild-running processes and reaching out to the userwho runs the problematic software, which leads to a large amount oftime and resources being wasted among users and administrators. Allthis is only because we designed our R package to use all cores bydefault. This is not a made-up toy story; it is a very likelyscenario that happens on shared servers if you make detectCores()the default in your R code.

Shameless advertisement for the parallelly package: Incontrast to detectCores(), if you use parallelly::availableCores()the user, or the systems administrator, can limit the default numberof CPU cores returned by setting environment variableR_PARALLELLY_AVAILABLECORES_FALLBACK. For instance, by setting itto R_PARALLELLY_AVAILABLECORES_FALLBACK=2 centrally,availableCores() will, unless there are other settings that allowthe process to use more, return two cores regardless how many CPUcores the machine has. This will lower the damage any single processcan inflict on the system. It will take many such processes runningat the same time in order for them to have an overall a negativeimpact. The risk for that to happen by mistake is much lower thanwhen using detectCores() by default.

A shared compute cluster with many machines

Other, larger compute systems, often referred to as high-performancecompute (HPC) cluster, have a job scheduler for running scripts inbatches distributed across multiple machines. When users submit theirscripts to the scheduler’s job queue, they request how many cores andhow much memory each job requires. For example, a user on a Slurmcluster can request that their run_my_rscript.sh script gets to runwith 48 CPU cores and 256 GiB of RAM by submitting it to the scheduleras:

sbatch --cpus-per-task=48 --mem=256G run_my_rscript.sh

The scheduler keeps track of all running and queued jobs, and whenenough compute slots are freed up, it will launch the next job in thequeue, giving it the compute resources it requested. This is a veryconvenient and efficient way to batch process a large amount ofanalyses coming from many users.

However, just like with a shared server, it is important that thesoftware tools running this way respect the compute resources that thejob scheduler allotted to the job. The detectCores() function doesnot know about job schedulers - all it does is return the number ofCPU cores on the current machine regardless of how many cores the jobhas been allotted by the scheduler. So, if your R package usesdetectCores() cores by default, then it will overuse the CPUs andslow things down for everyone running on the same compute node.Again, when this happens, it often slows everything done and triggerslots of wasted user and admin efforts spent on troubleshooting andcommunication back and forth.

Shameless advertisement for the parallelly package: Incontrast, parallelly::availableCores() respects the number of CPUslots that the job scheduler has given to the job. It recognizesenvironment variables set by our most common HPC schedulers, includingFujitsu Technical Computing Suite (PJM), Grid Engine (SGE), LoadSharing Facility (LSF), PBS/Torque, and Simple Linux Utility forResource Management (Slurm).

Running R via CGroups on in a Linux container

This far, we have been concerned about the overuse of the CPU coresaffecting other processes and other users running on the same machine.Some systems are configured to protect against misbehaving softwarefrom affecting other users. In Linux, this can be done with so-calledcontrol groups (“cgroups”), where a process gets allotted a certainamount of CPU cores. If the process uses too many parallel workers,they cannot break out from the sandbox set up by cgroups. From theoutside, it will look like the process uses its maximum amount ofallocated CPU cores. Some HPC job schedulers have this featureenabled, but not all of them. You find the same feature for Linuxcontainers, e.g. we can limit the number of CPU cores, or throttle theCPU load, using command-line options when you launch a Dockercontainer, e.g. docker run --cpuset-cpus=0-2,8 … or docker run--cpu=3.4 ….

So, if you are a user on a system where compute resources arecompartmentalized this way, you run a much lower risk for wreakinghavoc on a shared system. That is good news, but if you run too manyparallel workers, that is, try to use more cores than available toyou, then you will clog up your own analysis. The behavior would bethe same as if you request 96 parallel workers on your localeight-core notebook (the scenario in the left panel of Figure 1),with the exception that you will not overheat the computer.

The problem with detectCores() is that it returns the number of CPUcores on the hardware, regardless of the cgroups settings. So, ifyour R process is limited to eight cores by cgroups, and you usencores = detectCores() on a 96-core machine, you will end up running96 parallel workers fighting for the resources on eight cores. Areal-world example of this happens for those of you who have a freeaccount on RStudio Cloud. In that case, you are given only a singleCPU core to run your R code on, but the underlying machine typicallyhas 16 cores. If you use detectCores() there, you will end upcreating 16 parallel workers, running on the same CPU core, which is avery ineffecient way to run the code.

Shameless advertisement for the parallelly package: Incontrast to detectCores(), parallelly::availableCores() respectscgroups, and will return eight cores instead of 96 in the aboveexample, and a single core on a free RStudio Cloud account.

My opinionated recommendation

The right-handside graph of Figure 1, which shows mostly green bars at near 100% load for 24 CPU cores.Figure 3: If we avoid overusing the CPU cores, then everything will runmuch smoother and much faster.

As developers, I think we should at least be aware of these problems,and acknowledge that they exist and they are indeed real problem thatpeople run into “out there”. We should also accept that we cannotpredict on what type of compute environment our R code will run on.Unfortunately, I don’t have a magic solution that addresses all theproblems reported here. That said, I think the best we can do is tobe conservative and don’t make hard-coded decisions on parallelizationin our R packages and R scripts.

Because of this, I argue that the safest is to design your R packageto run sequentially by default (e.g. ncores = 1L), and leave it tothe user to decide on the number of parallel workers to use.

The second-best alternative that I can come up with, is to replacedetectCores() with availableCores(), e.g. ncores =parallelly::availableCores(). It is designed to respect commonsystem and R settings that control the number of allowed CPU cores.It also respects R options and environment variables commonly used tolimit CPU usage, including those set by our most common HPC jobschedulers. In addition, it is possible to control the fallbackbehavior so that it uses only a few cores when nothing else being set.For example, if the environment variableR_PARALLELLY_AVAILABLECORES_FALLBACK is set to 2, thenavailableCores() returns two cores by default, unless other settingsallowing more are available. A conservative systems administrator maywant to set export R_PARALLELLY_AVAILABLECORES_FALLBACK=1 in/etc/profile.d/single-core-by-default.sh. To see other benefitsfrom using availableCores(), seehttps://parallelly.futureverse.org.

Believe it or not, there’s actually more to be said on this topic, butI think this is already more than a mouthful, so I will save that foranother blog post. If you made it this far, I applaud you and I thankyou for your interest. If you agree, or disagree, or have additional thoughts around this, please feel free to reach out on the Future Discussions Forum.

Over and out,

Henrik

1 Searching code on GitHub, requires you to log in toGitHub.

To leave a comment for the author, please follow the link and comment on their blog: JottR on R.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Continue reading: Please Avoid detectCores() in your R Packages