Expand your Bluesky network with R (repost)
R-bloggers 2024-11-20
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
This is reposted from the original at https://blog.stephenturner.us/p/expand-your-bluesky-network-with-r.
—
I’m encouraging everyone I know online to join the scientific community on Bluesky.
In that post I link to several starter packs — lists of accounts posting about a topic that you can follow individually or all at once to start filling out your network.
I started following accounts of people I knew from X and from a few starter packs I came across. One way to expand your network is to take all the accounts you follow, see who they are following but you aren’t. You can rank this list descending by the number of your follows who follow them, and use that list as a way to fill out your network.
Let’s do this with just a few lines of code in R. The atrrr package (CRAN, GitHub, Docs) is one of several packages that wraps the AT protocol behind Bluesky, allowing you to interact with Bluesky through a set of R functions. It’s super easy to use and the docs are great.
The code below does this. It will first authenticate with an app password. It then retrieves all the accounts you follow. Next, it gets who all those accounts follow, and removes the accounts you already follow.1
library(dplyr)library(atrrr)# Authenticate first (switch out with your username)bsky_username <- "youraccount.bsky.social"# If you already have an app password:bsky_app_pw <- "change-me-change-me-123"auth(user=bsky_username, password=bsky_app_pw)# Or be guided through the processauth()# Get the people you followf <- get_follows(actor=bsky_username, limit=Inf)# Get just their handlesfh <- f$actor_handle# Get who your follows are followingff <- fh |> lapply(get_follows, limit=Inf) |> setNames(fh)# Make it a data frameffdf <- bind_rows(ff, .id="follow")# Get counts, removing ppl you already followffcounts <- ffdf |> count(actor_handle, sort=TRUE) |> anti_join(f, by="actor_handle") |> filter(actor_handle!="handle.invalid")# Join back to account info, add URLffcounts <- ffdf |> distinct(actor_handle, actor_name) |> inner_join(x=ffcounts, y=_, by="actor_handle") |> mutate(url=paste0("https://bsky.app/profile/", actor_handle))
This returns a data frame of all the accounts followed by the people you follow, but who you don’t already follow, descending by the number of accounts you follow who follow them (mouthful right there).
Optional, but you can make this nicer by using the gt package to make a nice table with a clickable link.
# Optional, clean up and create a nice tablelibrary(gt)library(glue)top <- 20Lffcounts |> head(top) |> rename(Handle=actor_handle, N=n, Name=actor_name) |> mutate(Handle=glue("[{Handle}]({url})")) |> mutate(Handle=lapply(Handle, gt::md)) |> select(-url) |> gt() |> tab_header( title=md(glue("**My top {top} follows' follows**")), subtitle="Collected November 19, 2024") |> tab_style( style="font-weight:bold", locations=cells_column_labels() ) |> cols_align(align="left") |> opt_row_striping(row_striping = TRUE)
I can’t embed an HTML file here, but here’s what that output looks like. You can click any one of the names and follow the account if you find it useful.
Maybe you do this iteratively – add your top follows’ follows, then rerun the process a few times to possibly discover unknown second-degree connections.
The code here essentially replicates what @theo.io’s Bluesky Network Analyzer is doing, but all locally using R. That web app is faster and easier to use, and does some smart caching and throttling to avoid API rate limits. See the footnote for more.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.