Creating circular boxplot for circadian data

Question:

I am trying to find an R code to make a plot that looks like this (below). I know this figure was created in python because I emailed the person who created this code but after 2 months of back and forth with him promising to get me the code, he ghosted me. I also tried contacting my Data Analytics Core and tried searching for a code I could build off of but didn’t find anything even close to what I am trying to accomplish so I decided to ask the hive mind. I’ve used python in the past so I could manage following a python script but I primarily use R to plot data.

The publication this figure is from- https://pubmed.ncbi.nlm.nih.gov/36599299/

Fig 5a

The data I plan to use to create the plot is circadian data.

It contains:
sample = three datasets (BAT, WAT and in vitro).
process = the process that was found to be circadianly regulated (i.e. the thing being measured).
ct = the timing that subprocesses happen during the circadian day with the day having 24hrs and wrapping around at 24/0h.

Here is a sample of my data for "BAT" –
sample process ct

 sample   process            ct
    BAT CellCycle   1.140128762
    BAT CellCycle   3.106516768
    BAT CellCycle   3.738113578
    BAT CellCycle   23.44912305
    BAT CellCycle   23.49744379
    BAT CellCycle   23.51422441
    BAT CellCycle   23.33041125
    BAT CellCycle   22.19656322
    BAT CellCycle   22.84183477
    BAT ECM 22.04149961
    BAT ECM 23.45526633
    BAT ECM 23.36802574
    BAT ECM 23.40688727
    BAT ECM 1.732071528
    BAT ECM 1.529777828
    BAT ECM 20.29377917
    BAT Energy  2.19647636
    BAT Energy  4.177766894
    BAT Energy  2.624942744
    BAT Energy  1.633080723
    BAT Energy  2.25815373
    BAT Energy  22.52527935
    BAT Energy  21.844178
    BAT Energy  1.44484992
    BAT Energy  1.710318176
    BAT Energy  1.721398481
    BAT Energy  5.0700794
    BAT Energy  3.252122282
    BAT Energy  19.69284419
    BAT Energy  2.521538024
    BAT Energy  2.309621407
    BAT Energy  1.979057875
    BAT Energy  1.747326389
    BAT Energy  1.620003554
    BAT Energy  4.512535826
    BAT Energy  19.48505391
    BAT Energy  3.96613478

There will ultimately be one plot per sample (3 plots) showing the phase distribution of each process with the median and IQR depicted.

Asked By: Jen1984

||

Answers:

It’s difficult to know with only a small sample of your data, but here’s the basic principle of how you could do it in R using ggplot:

library(tidyverse)

df %>%
  mutate(ct = ifelse(ct > 12, ct - 24, ct)) %>%
  ggplot(aes(as.numeric(factor(process)), ct, fill = process)) +
  geom_vline(xintercept = 1:3, linewidth = 7, color = 'gray92') +
  stat_boxplot(geom = 'errorbar', width = 0.6, aes(color = process)) +
  geom_boxplot(width = 0.6, linewidth = 0, outlier.colour = NA) +
  stat_boxplot(geom = 'errorbar', width = 0.6,
               aes(ymax = after_stat(middle), ymin = after_stat(middle))) +
  geom_hline(yintercept = c(-12, -8, -4, 0, 4, 8), linetype = 5,
             alpha = 0.2) +
  
  annotate('rect', xmax = -2, xmin = -Inf, ymin = -Inf, ymax = Inf,
           fill = 'white') +
  annotate('text', x = -4, y = 0, label = 'BAT', fontface = 2, size = 4) +
  coord_polar(theta = 'y', start = pi) + 
  theme_void() +
  scale_x_continuous(limits = c(-4, 3.3)) +
  scale_y_continuous(limits = c(-12, 12), breaks = -12:11,
                     labels = ~sprintf('%02d:00', ifelse(.x < 0, .x + 24,.x))) +
  theme(axis.text.x = element_text(),
        legend.position = 'top') +
  scale_color_manual(values = c('#ca6c28', '#aa2a25', '#538335')) +
  scale_fill_manual(values = c('#ca6c28', '#aa2a25', '#538335'))

enter image description here

You will need to be careful to pick a safe ‘wrap-around’ time for each plot such that none of your samples have a range that crosses the wrap-around time. In your sample data, 12 midday is a safe time because none of your samples’ ranges cross midday.


Data taken from question in reproducible format

df <- structure(list(sample = c("BAT", "BAT", "BAT", "BAT", "BAT", 
"BAT", "BAT", "BAT", "BAT", "BAT", "BAT", "BAT", "BAT", "BAT", 
"BAT", "BAT", "BAT", "BAT", "BAT", "BAT", "BAT", "BAT", "BAT", 
"BAT", "BAT", "BAT", "BAT", "BAT", "BAT", "BAT", "BAT", "BAT", 
"BAT", "BAT", "BAT", "BAT", "BAT"), process = c("CellCycle", 
"CellCycle", "CellCycle", "CellCycle", "CellCycle", "CellCycle", 
"CellCycle", "CellCycle", "CellCycle", "ECM", "ECM", "ECM", "ECM", 
"ECM", "ECM", "ECM", "Energy", "Energy", "Energy", "Energy", 
"Energy", "Energy", "Energy", "Energy", "Energy", "Energy", "Energy", 
"Energy", "Energy", "Energy", "Energy", "Energy", "Energy", "Energy", 
"Energy", "Energy", "Energy"), ct = c(1.140128762, 3.106516768, 
3.738113578, 23.44912305, 23.49744379, 23.51422441, 23.33041125, 
22.19656322, 22.84183477, 22.04149961, 23.45526633, 23.36802574, 
23.40688727, 1.732071528, 1.529777828, 20.29377917, 2.19647636, 
4.177766894, 2.624942744, 1.633080723, 2.25815373, 22.52527935, 
21.844178, 1.44484992, 1.710318176, 1.721398481, 5.0700794, 3.252122282, 
19.69284419, 2.521538024, 2.309621407, 1.979057875, 1.747326389, 
1.620003554, 4.512535826, 19.48505391, 3.96613478)), row.names = c(NA, 
-37L), class = "data.frame")
Answered By: Allan Cameron
Categories: questions Tags: , , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.