align genomic features with phylogenetic tree
R-bloggers 2016-11-16
A question on biostars asking how to generate the following figure:

This can be quite easy to implement in ggtree, I can write a geom layer to layout the alignment. As ggbio already provides many geom for genomic data and I don’t want to re-invent the wheel, I decided to try ggtree+ggbio. This is also the beauty of R that packages complete each others.
Here is a sample data presented in ggbio vignette:
library(ggbio)library(biovizBase) library(Homo.sapiens)library(TxDb.Hsapiens.UCSC.hg19.knownGene)txdb <- TxDb.Hsapiens.UCSC.hg19.knownGenedata(genesymbol, package = "biovizBase")wh <- genesymbol[c("BRCA1", "NBR1")]wh <- range(wh, ignore.strand = TRUE)gr.txdb <- crunch(txdb, which = wh)## change column to model colnames(values(gr.txdb))[4] <- "model"grl <- split(gr.txdb, gr.txdb$tx_id)set.seed(2016-10-25)names(grl) <- sample(LETTERS, size = length(grl), replace = TRUE)> print(grl)GRangesList object of length 32:$J GRanges object with 7 ranges and 4 metadata columns: seqnames ranges strand | tx_id tx_name <Rle> <IRanges> <Rle> | <factor> <factor> [1] chr17 [41277600, 41277787] + | 61241 uc002idf.3 [2] chr17 [41283225, 41283287] + | 61241 uc002idf.3 [3] chr17 [41284973, 41285154] + | 61241 uc002idf.3 [4] chr17 [41290674, 41292342] + | 61241 uc002idf.3 [5] chr17 [41277788, 41283224] * | 61241 uc002idf.3 [6] chr17 [41283288, 41284972] * | 61241 uc002idf.3 [7] chr17 [41285155, 41290673] * | 61241 uc002idf.3 gene_id model <factor> <factor> [1] 10230 exon [2] 10230 exon [3] 10230 exon [4] 10230 exon [5] 10230 gap [6] 10230 gap [7] 10230 gap$U GRanges object with 3 ranges and 4 metadata columns: seqnames ranges strand | tx_id tx_name [1] chr17 [41277600, 41277787] + | 61242 uc010czb.2 [2] chr17 [41290674, 41292342] + | 61242 uc010czb.2 [3] chr17 [41277788, 41290673] * | 61242 uc010czb.2 gene_id model [1] 10230 exon [2] 10230 exon [3] 10230 gap$D GRanges object with 9 ranges and 4 metadata columns: seqnames ranges strand | tx_id tx_name [1] chr17 [41277600, 41277787] + | 61243 uc002idg.3 [2] chr17 [41283225, 41283287] + | 61243 uc002idg.3 [3] chr17 [41290674, 41290939] + | 61243 uc002idg.3 [4] chr17 [41291833, 41292300] + | 61243 uc002idg.3 [5] chr17 [41296745, 41297125] + | 61243 uc002idg.3 [6] chr17 [41277788, 41283224] * | 61243 uc002idg.3 [7] chr17 [41283288, 41290673] * | 61243 uc002idg.3 [8] chr17 [41290940, 41291832] * | 61243 uc002idg.3 [9] chr17 [41292301, 41296744] * | 61243 uc002idg.3 gene_id model [1] 10230 exon [2] 10230 exon [3] 10230 exon [4] 10230 exon [5] 10230 exon [6] 10230 gap [7] 10230 gap [8] 10230 gap [9] 10230 gap...<29 more elements>-------seqinfo: 1 sequence from hg19 genomeWe can visualize the alignment simply using:
ggplot() + geom_alignment(grl, alpha=.5)
The input data for geom_alignment is a GRangesList object, while facet_plot defined in ggtree expect the input data as a data.frame. I extend the facet_plot to work with geom_alignment. In doing this, I find a bug of the geom_alignment function and send a patch to Michael. My patch was incorporated in ggbio 1.23.2.
With the updates of both ggtree and ggbio, we can use facet_plot to align alignment with phylogenetic tree.
Suppose we have the following tree:
library(ggtree)n <- names(grl) %>% unique %>% lengthset.seed(2016-10-25)tr = rtree(n)set.seed(2016-10-25)tr$tip.label = sample(unique(names(grl)), n)p <- ggtree(tr) + geom_tiplab()
It is quite easy to use facet_plot to visualize the above alignment with this tree:
facet_plot(p, 'alignment', grl, geom_alignment, inherit.aes=FALSE, mapping=aes())
Beware of mapping=aes() is required as ggbio can’t accept mapping=NULL.
PS: I only test geom_alignment which works with GRanges and GRangesList. Other geoms defined in ggbio may not be supported. If you find facet_plot fail to work with other geoms, please open an issue in github. Feature requests are welcome.
Citation
G Yu, DK Smith, H Zhu, Y Guan, TTY Lam*. ggtree: an R package for visualization and annotation of phylogenetic trees with their covariates and other associated data. Methods in Ecology and Evolution. doi:10.1111/2041-210X.12628.
R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...