You Don’t Need to Learn All the Weights on tabular data: The Case for rvflnet (a nonlinear expressive glmnet) on regression, classification and survival analysis

R-bloggers 2026-05-02

[This article was first published on T. Moudiki's Webpage - R, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Introduction

Random Vector Functional Link (RVFL) networks offer a simple yet powerful alternative to traditional neural networks for tabular data. Instead of learning hidden layers through backpropagation, RVFL generates them randomly (or not, if using a deterministic sequence of quasi-random numbers) and focuses all learning effort on a final, regularized linear model.

Formally, let

\[X \in \mathbb{R}^{n \times p}\]

be the input data. RVFL networks (the ones described in this blog post) construct a set of nonlinear features by projecting (X) onto a random matrix

\[W \in \mathbb{R}^{p \times m},\]

and applying an activation function (\(g(\cdot)\)):

\[H = g\left( \frac{X – \mu}{\sigma} ; W \right).\]

These random nonlinear features are then concatenated with the original inputs to form an augmented design matrix:

\[Z = [X | H].\]

The model prediction is obtained by fitting a linear model on this expanded space (hence, a nonlinear GLM):

\[\hat{y} = Z \beta.\]

Because (Z) can be high-dimensional and highly redundant, RVFL networks (the ones described in this blog post) rely on Elastic Net regularization (glmnet) to estimate the coefficients:

\[\hat{\beta} = \arg\min_{\beta}\mathcal{L}(y, Z\beta) + \lambda \left(\alpha ||\beta||_1 + (1-\alpha)||\beta||_2^2\right).\]

In this framework, randomness creates a rich pool of nonlinear transformations, while regularization selects and stabilizes the most useful ones. The result is a nonlinear model that combines the flexibility of neural networks with the efficiency and robustness of linear methods.

Of course, this blog post is not a proof of the title. It’s about R package rvflnet. But you can appreciate the high performance of RVFLs on regression, classification and survival analysis, an notably on the controversial Boston dataset (performs on par with Random Forest or Gradient Boosting).

0 – Install package

install.packages("survival", repos = "https://cran.r-project.org") # survival analysisinstall.packages("remotes", repos = "https://cran.r-project.org")devtools::install_github('thierrymoudiki/rvflnet') # Nonlinear glm (RVFL networks)

1 – Regression

set.seed(123)library(glmnet)data(Boston, package = "MASS")# -------------------------# Data# -------------------------X <- as.matrix(Boston[, -14])y <- Boston$medvn <- nrow(X)idx <- sample(1:n, size = round(0.8 * n))X_train <- X[idx, ]y_train <- y[idx]X_test <- X[-idx, ]y_test <- y[-idx]# -------------------------# Grid# -------------------------grid <- expand.grid(  n_hidden = c(175, 200, 225, 250),  alpha = seq(0.1, 0.5, by=0.2),  include_original = c(TRUE, FALSE),  seed = 1,  stringsAsFactors = FALSE)results <- vector("list", nrow(grid))# -------------------------# Loop# -------------------------for (i in seq_len(nrow(grid))) {  params <- grid[i, ]  #cat("\n========================================\n")  #cat(sprintf("Run %d / %d\n", i, nrow(grid)))  #print(params)  # -------------------------  # Fit model  # -------------------------  fit <- rvflnet::rvflnet(    X_train, y_train,    n_hidden = params$n_hidden,    activation = "sigmoid",    W_type = "gaussian",    seed = params$seed,    include_original = params$include_original, # direct link, skip connection or not    alpha = params$alpha  )  # -------------------------  # Evaluate full lambda path  # -------------------------  lambdas <- fit$fit$lambda  preds <- predict(fit, newx = X_test, s = lambdas)  rmse_path <- sqrt(colMeans((preds - y_test)^2))  best_idx <- which.min(rmse_path)  best_rmse <- rmse_path[best_idx]  best_lambda <- lambdas[best_idx]  # -------------------------  # Sparsity  # -------------------------  coef_mat <- coef(fit, s = best_lambda)  nonzero <- sum(coef_mat[-1, 1] != 0)  # -------------------------  # Verbose output  # -------------------------  #cat(sprintf("Best RMSE: %.4f\n", best_rmse))  #cat(sprintf("Best lambda: %.6f\n", best_lambda))  #cat(sprintf("Non-zero coeffs: %d\n", nonzero))  # -------------------------  # Store  # -------------------------  results[[i]] <- data.frame(    n_hidden = params$n_hidden,    alpha = params$alpha,    include_original = params$include_original,    seed = params$seed,    rmse = best_rmse,    lambda = best_lambda,    nonzero = nonzero  )}# -------------------------# Aggregate# -------------------------results_df <- do.call(rbind, results)results_df <- results_df[order(results_df$rmse), ]print(head(results_df))Loading required package: MatrixLoaded glmnet 4.1-10               n_hidden alpha include_original seed     rmse     lambda nonzeros= 0.027561759      200   0.1             TRUE    1 2.881935 0.02756176     190s= 0.017620327      200   0.3             TRUE    1 2.884739 0.01762033     167s= 0.012734248      200   0.5             TRUE    1 2.889339 0.01273425     158s= 0.036435024      175   0.1             TRUE    1 2.920012 0.03643502     165s= 0.016833926      175   0.5             TRUE    1 2.938472 0.01683393     136s= 0.023293035      175   0.3             TRUE    1 2.941267 0.02329304     144

An RMSE of 2.88 is on par with Random Forest or Gradient Boosting, with a significantly faster computation time.

2 - Classification

2 - 1 Binary Classification

set.seed(123)data(iris)# Binary classification: setosa vs othersy <- ifelse(iris$Species == "setosa", 1, 0)X <- as.matrix(iris[, 1:4])# Train/test splitn <- nrow(X)idx <- sample(1:n, size = round(0.8 * n))X_train <- X[idx, ]y_train <- y[idx]X_test <- X[-idx, ]y_test <- y[-idx]# -------------------------# Fit model# -------------------------cv_model <- rvflnet::cv.rvflnet(  X_train, y_train,  n_hidden = 50,  activation = "relu",  W_type = "gaussian",  family = "binomial",  nfolds = 5)# -------------------------# Predictions (probabilities)# -------------------------(probs <- predict(cv_model, X_test, type = "response"))# Convert to classy_pred <- ifelse(probs > 0.5, 1, 0)all.equal(as.numeric(y_pred), as.numeric(predict(cv_model, X_test, type="class")))# -------------------------# Diagnostics# -------------------------# Accuracyacc <- mean(drop(y_pred) == y_test)cat("Accuracy:", acc, "\n")# Confusion matrixtable(Predicted = y_pred, Actual = y_test)

A matrix: 30 × 1 of type dbllambda.min0.99976170020.99922679550.99971206780.99975248670.99966004810.99924720820.99961017440.99993565200.99981395680.99954187620.00033288850.00033288850.00033288850.00199370120.00033288850.00054599700.00033288850.00050358480.00033288850.00033288850.00033288850.00033288850.00033288850.00033288850.00033288850.00033288850.00033288850.00033288850.00033288850.0003328885

TRUE

Accuracy: 1          ActualPredicted  0  1        0 20  0        1  0 10

2 - 2 Multiclass Classification

set.seed(123)data(iris)y <- as.numeric(iris$Species)X <- as.matrix(iris[, 1:4])# Train/test splitn <- nrow(X)idx <- sample(1:n, size = round(0.8 * n))X_train <- X[idx, ]y_train <- y[idx]X_test <- X[-idx, ]y_test <- y[-idx]# -------------------------# Fit model# -------------------------cv_model <- rvflnet::rvflnet(  X_train, y_train,  n_hidden = 50,  activation = "relu",  W_type = "gaussian",  family = "multinomial",  nlambda = 25,  nfolds = 5)# -------------------------# Diagnostics# -------------------------# Accuracyacc <- colMeans(predict(cv_model, X_test, type="class") == y_test)cat("Accuracies:", acc, "\n") # consider other metricsAccuracies: 0.1666667 0.7666667 0.9333333 0.9666667 0.9666667 0.9666667 0.9666667 0.9666667 0.9666667 0.9666667 0.9666667 0.9666667 0.9666667 0.9666667 0.9666667 0.9666667 0.9666667 0.9666667 0.9666667 0.9666667 0.9666667 0.9666667 0.9666667

3 - Nonlinear Cox survival analysis

3 - 1 Example 1

library(survival)library(rvflnet)data(ovarian)X <- as.matrix(ovarian[, c("age", "resid.ds", "rx", "ecog.ps")])y <- Surv(ovarian$futime, ovarian$fustat)set.seed(123)n <- nrow(X)train_idx <- sample(1:n, size = round(0.8 * n))X_train <- X[train_idx, ]X_test  <- X[-train_idx, ]y_train <- y[train_idx]y_test  <- y[-train_idx]# -------------------------# Fit model# -------------------------cv_fit <- rvflnet::cv.rvflnet(  X_train, y_train,  family = "cox",  nfolds = 5,  type.measure = "C")plot(cv_fit)# Out-of-sample C-indexprint(glmnet::Cindex(pred = predict(cv_fit, X_test), y = y_test))Warning message in data(ovarian):“data set ‘ovarian’ not found”[1] 0.8571429

image-title-here

3 - 2 Example 2

library(glmnet)library(survival)data(pbc)pbc2       <- pbc[!is.na(pbc$trt), ]pbc2$event <- as.integer(pbc$status[!is.na(pbc$trt)] == 2)pbc2$sex_n <- as.integer(pbc2$sex == "f")feat_cols <- c("trt","age","sex_n","ascites","hepato","spiders","edema",               "bili","chol","albumin","copper","alk.phos","ast",               "trig","platelet","protime","stage")df <- pbc2[, c("time", "event", feat_cols)]for (col in feat_cols)  if (any(is.na(df[[col]])))    df[[col]][is.na(df[[col]])] <- median(df[[col]], na.rm = TRUE)set.seed(42)idx_train <- sample(nrow(df), floor(0.75 * nrow(df)))train <- df[idx_train, ]; test <- df[-idx_train, ]X_tr  <- as.matrix(train[, feat_cols])X_te  <- as.matrix(test[,  feat_cols])y_tr   <- Surv(train$time, train$event)fit <- rvflnet::rvflnet(  X_tr, y_tr,  family = "cox",  alpha=0.1, lambda=0.1 # not recommended)y_te   <- Surv(test$time, test$event)ci <- glmnet::Cindex(predict(fit, X_te), y_te)cat("\n=== Test-set C-index ===\n")print(ci)=== Test-set C-index ===[1] 0.8218117fit <- rvflnet::rvflnet(  X_tr, y_tr,  family = "cox",  alpha=0.1, nlambda=50)y_te   <- Surv(test$time, test$event)(cis <- apply(predict(fit, X_te), 2, function(x) glmnet::Cindex(x, y_te)))#cat("\n=== Test-set C-index ===\n")plot(log(fit$fit$lambda), cis, type = 'l')abline(h=0.8, lty=2, col="red")

.dl-inline {width: auto; margin:0; padding: 0}.dl-inline>dt, .dl-inline>dd {float: none; width: auto; display: inline-block}.dl-inline>dt::after {content: ":\0020"; padding-right: .5ex}.dl-inline>dt:not(:first-of-type) {padding-left: .5ex}

s0: 0.5
s1: 0.762812872467223
s2: 0.802145411203814
s3: 0.811084624553039
s4: 0.811680572109654
s5: 0.814064362336114
s6: 0.815852205005959
s7: 0.817640047675805
s8: 0.820023837902265
s9: 0.81942789034565
s10: 0.817640047675805
s11: 0.81823599523242
s12: 0.81823599523242
s13: 0.815852205005959
s14: 0.814660309892729
s15: 0.813468414779499
s16: 0.813468414779499
s17: 0.815852205005959
s18: 0.814660309892729
s19: 0.82061978545888
s20: 0.81942789034565
s21: 0.82181168057211
s22: 0.82061978545888
s23: 0.817044100119189
s24: 0.817640047675805
s25: 0.81823599523242
s26: 0.814660309892729
s27: 0.810488676996424
s28: 0.803933253873659
s29: 0.802145411203814
s30: 0.799761620977354
s31: 0.793206197854589
s32: 0.789034564958284
s33: 0.777711561382598
s34: 0.771156138259833
s35: 0.766984505363528
s36: 0.756853396901073
s37: 0.748510131108462
s38: 0.743146603098927
s39: 0.735399284862932
s40: 0.728843861740167
s41: 0.721692491060787
s42: 0.718116805721096
s43: 0.717520858164482
s44: 0.716924910607867
s45: 0.716924910607867
s46: 0.715733015494636
s47: 0.716328963051251
s48: 0.715137067938021
s49: 0.713945172824791

image-title-here

To leave a comment for the author, please follow the link and comment on their blog: T. Moudiki's Webpage - R.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Continue reading: You Don’t Need to Learn All the Weights on tabular data: The Case for rvflnet (a nonlinear expressive glmnet) on regression, classification and survival analysis