Machine Learning Powered Naughty List: A Festive Jumping Rivers Story

R-bloggers 2025-12-18

[This article was first published on The Jumping Rivers Blog, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Introduction

Ho ho ho! The holiday season is here, and at Jumping Rivers, we’redecking the halls with data, not just tinsel. While elves are busychecking their lists twice, we thought: why not bring a little machinelearning magic to Christmas? After all, what’s more festive thancombining predictive modeling with candy canes, cookies, and a sprinkleof office mischief?

This blog is your all-access pass to a code-powered journey where wefind out who’s been naughty, who’s nice, and who’s just mischievouslyhovering in between.

We’ll walk you through the process step by step: gathering the teamdata, inventing the most festive features, training our ML model, andrevealing the results with a cheeky, holiday twist. So grab a mug ofcocoa, put on your favorite Christmas socks, and let’s dive into theJumping Rivers ML-Powered Naughty List adventure!

Note: All data, labels, and results in this post are entirelyfictional and randomly generated for festive fun.

Step 1: Data Collection and Team Introduction

Our first step was gathering our dataset. We used the Jumping Riversteam as the participants, assigning playful, holiday-themed features toreflect their potential ‘naughty’ traits. Here’s a concise, festiveoverview in a side-by-side table format:

Each participant is assigned four playful features that representholiday mischief:

Ate too many cookies
Forgot to send Christmas cards
Sang off-key during carols
Gift wrapping disasters

Every name on this list is now in the running for the ultimate festivetitle: Naughty, Nice, or Mildly Mischievous. Rumor has it that Santa’sIntern Elf already claimed the top spot for cookie mischief, whileRudolph keeps dashboards squeaky clean, and Frosty the Snow Analyst ismaintaining a perfectly balanced winter score.

Whether you want to start from scratch, or improve your skills, Jumping Rivers has a training course for you.

Step 2: Feature Engineering

For ML purposes, names were encoded numerically. This is not meaningfulin a real-world ML context but serves as a demonstration ofpreprocessing. The features for modeling include:

Name (encoded)
Ate too many cookies
Forgot to send Christmas cards
Sang off-key
Gift wrapping disasters

Step 3: Model Training

We chose a Random Forest classifier in R for its simplicity andinterpretability. The model was trained on the dataset to predict the‘naughty’ label based on the four behavioral features and the encodedname. Although the dataset is small and playful, this demonstrates aproper ML workflow: data collection, preprocessing, model training,prediction.

library(tidyverse)library(randomForest)library(ggplot2)

The first thing we need to do is set up a vector containing the teammembers along with some Christmas temp workers Santa’s Intern Elf,Rudolph the Data Reindeer and Frosty the Snow Analyst.

# Team membersteam = c( "Esther Gillespie", "Colin Gillespie", "Sebastian Mellor", "Martin Smith", "Richard Brown", "Shane Halloran", "Mitchell Oliver", "Keith Newman", "Russ Hyde", "Gigi Kenneth", "Pedro Silva", "Carolyn Wilson", "Myles Mitchell", "Theo Roe", "Tim Brock", "Osheen MacOscar", "Emily Wales", "Amieroh Abrahams", "Deborah Washington", "Susan Smith", "Santa's Intern Elf", "Rudolph the Data Reindeer", "Frosty the Snow Analyst")

Now we have the team members we will randomly generate some valuesfor the model features.

# Randomly generate playful 'naughty traits'set.seed(51)df = tibble( name = team, ate_too_many_cookies = sample(0:1, length(team), replace = TRUE), forgot_to_send_cards = sample(0:1, length(team), replace = TRUE), sang_off_key = sample(0:1, length(team), replace = TRUE), wrapping_disaster = sample(0:1, length(team), replace = TRUE), naughty = sample(0:1, length(team), replace = TRUE))# Encode names as numericdf$name_encoded = as.numeric(factor(df$name))

Next on the list is to set up a vector of features we want to use, andthen train the model. We can then use the model to predict ourfictitious naughtiness score for each team member! We can see Theo is atthe top of the list, closely followed by Osheen.

features = c( "name_encoded", "ate_too_many_cookies", "forgot_to_send_cards", "sang_off_key", "wrapping_disaster")# Train Random Forestrf_model = randomForest(x = df[, features], y = as.factor(df$naughty), ntree = 100)# Predict naughtinessdf$predicted_naughty = predict(rf_model, df[, features])df$naughtiness_score = predict(rf_model, df[, features], type = "prob")[, 2]# Create the Naughty Listnaughty_list = df %>% arrange(desc(naughtiness_score)) %>% select(name, naughtiness_score, predicted_naughty)print(naughty_list)## # A tibble: 23 × 3## name naughtiness_score predicted_naughty## <chr> <dbl> <fct>## 1 Theo Roe 0.76 1## 2 Osheen MacOscar 0.74 1## 3 Myles Mitchell 0.72 1## 4 Esther Gillespie 0.68 1## 5 Deborah Washington 0.66 1## 6 Tim Brock 0.59 1## 7 Amieroh Abrahams 0.55 1## 8 Santa's Intern Elf 0.48 0## 9 Carolyn Wilson 0.38 0## 10 Susan Smith 0.2 0## # ℹ 13 more rows

The last thing to do is visualise our results with{ggplot2}:

# Fun bar plotggplot(naughty_list, aes(x = reorder(name, naughtiness_score), y = naughtiness_score, fill = as.factor(predicted_naughty))) + geom_col() + coord_flip() + scale_fill_manual(values = c("0" = "forestgreen", "1" = "darkred"), labels = c("Nice", "Naughty")) + labs(title = "🎅 Jumping Rivers ML-powered Naughty List 🎄", x = "Team Member", y = "Naughtiness Score", fill = "Status", alt = "Jumping Rivers Naughty List") + theme_minimal(base_family = "outfit")

Ggplot2 column chart showing Jumping Rivers Naughty List

Step 4: Analysis and Notes

After generating predictions, we can interpret the Naughty List. Thehighest naughtiness scores indicate which participants are mostmischievous according to our playful model.

Observations from this analysis include:

Cookie Enthusiasts: Participants with multiple cookie infractionsscored higher.
Gift Wrapping Chaos: Those whose presents looked like abstract artcontributed to higher scores.
Musical Mishaps: Off-key carolers were highlighted as naughty.
Forgotten Cards: Small lapses in festive correspondence nudged some upthe naughty rankings.

Special mentions:

Theo unsurprisingly tops the naughty list.
Santa’s Intern Elf performed well, staying mostly nice.
Shane had the best score and I’m sure Santa will be very nice to himthis year!

This analysis provides both a technical demonstration of ML workflow anda fun story that engages readers during the festive season.

Step 5: Conclusion

This project demonstrates how machine learning can be used in creativeways outside of traditional business use cases. By combining featureswith a proper ML workflow, we created a light-hearted, festive storysuitable for a blog, while also reinforcing good practices in datacollection, preprocessing, modeling, and visualization.

Ultimately, the Jumping Rivers ML-Powered Naughty List is a celebrationof data science, team culture, and holiday fun. Whether you’re naughtyor nice, we hope this inspires creative applications of ML in festivecontexts.

For updates and revisions to this article, see the original post

To leave a comment for the author, please follow the link and comment on their blog: The Jumping Rivers Blog.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Continue reading: Machine Learning Powered Naughty List: A Festive Jumping Rivers Story