R

Let's Plot 9: The venerable box plot

Intro Load packages Import TSV (tab-separated-value) file Plotting! Hmm, the order is not ideal Overlay points Wilcox test ggbeeswarm Themes Themes, with some tweaking of color and text dabest, one comparison dabest, multiple comparisons Conclusion Session Info Intro This is the 9th Let’s Plot…and I’ve not done a workup of the most useful plot - the boxplot. Oops. Well let’s rectify that. Load packages Many many packages.

Let's Plot 8: (Animated) US State Covid-19 Case Count

Load packages, pull data 2020 03 30 Update Plotter function Cases by state Cases, with log10 scaling Deaths by state (log10 scaled) Deaths by state, animated Shift plot Transform Data and plot Add exponential lines Load packages, pull data 2020 03 30 Update CSSE changed their data structure, so I’ve updated the document. I was using their “time series” data, but they dropped the US-specific (with state by state info) documents.

Let's Plot 7: Clustered Dot Plots in the ggverse

2020 03 23 Update Intro Example dotplot How do I make a dotplot? But let’s do this ourself! Dotplot! Zero effort Remove dots where there is zero (or near zero expression) Better color, better theme, rotate x axis labels Tweak color scaling Now what? Hey look: ggtree Let’s glue them together with cowplot How do we do better? Two more tweak options if you are having trouble: One more adjust Moonshot Downside Exercises for the reader OLD Solution (kept for posterity) 2020 03 23 Update Ming Tang pointed out a better way to align plots, so I have rewritten the back end of this post.

One Developer Portal: eyeIntegration Genesis

News! eyeIntegration version 1.0 went live early this year (2019-01-16) and recently was accepted for publication in IOVS. In celebration of the news, I’m posting a small series of posts about the genesis, development, upgrades, and future of eyeIntegration. You can find our latest manuscript on bioRxiv. The latest update should go live soon. Background eyeIntegration was developed to serve as a quick and easy to use normal gene expression portal in eye tissues.

One Developer Portal: eyeIntegration Web Optimization

This post is a continuation from here. Really important stuff I learned to make a performant web site in Shiny After a few months of tinkering I had a working web app on my local computer, which is a 32GB of RAM, 1TB SSD Mac Pro trashcan. All of the data objects were .Rdata, which were load() when the site was initialized. This was fine in the beginning and in fact the shiny site was deployed with this structure in May of 2017.

Quick Guide to Gene Name Conversion

Background There are several popular naming systems for (human) genes: RefSeq (NM_000350) Ensembl (ENSG00000198691) HGNC Symbol (ABCA4) Entrez (24) Given enough time in #bioinformatics, you will have to do every possible combination of conversions. This post will very briefly explain the most expedient way to automatically convert between these formats with R. More exhaustive resources http://crazyhottommy.blogspot.com/2014/09/converting-gene-ids-using-bioconductor.html https://davetang.org/muse/2013/11/25/thoughts-converting-gene-identifiers/ Ensembl <-> HGNC <-> Entrez Stephen Turner has built a small set of data frames (well, tibbles) with core information, including transcript <-> gene info.

Let's Plot 6: Simple guide to heatmaps with ComplexHeatmaps

Introduction Data processing Load data Peek at expression Peek at metadata Brief outline on how the RNA-seq data was processed before we see it Load libraries Create a Sample - Sample distance heatmap Easy heatmap with ComplexHeatmap Complex heatmap Finished heatmap Gene Heatmaps A bit simpler Session Info Introduction Heatmaps are a core competency for a bioinformatician. They are a compact way to visually demonstrate relationships and changes in values across conditions.

Template for rmarkdown reports

What is this? Since I keep opening up random recent Rmarkdown documents to copy the header to paste into my next document, I figure it would be more efficient to just make a post I could reach from anywhere (with an internet connection). Copy / paste: — title: THE TITLE author: David McGaughey date: 'r format(Sys.Date(), &quot;%Y-%m-%d&quot;)' output: html_notebook: theme: flatly toc: true code_folding: hide — {r, message=F, warning=F, include=F} # Load Libraries without printing any warnings or messages library(tidyverse) # Session Info

Let’s Plot 5: ridgeline density plots

Intro For this installment of Let’s Plot (where anyone can make a figure!), we’ll be making the hottest visualization of 2017 - the joy plot or ridgeline plot. Joy plots are partially overlapping density line plots. They are useful for densely showing changes in many distributions over time / condition / etc. This type of visualization was inspired by the cover art from Joy Division’s album Unknown Pleasures and implemented in the R package ggridges by Claus Wilke.

Are you in genomics and building models? Stop using ROC - use PR

tldr Area Under the Curve (AUC) of Receiver Operating Characteristic (ROC) is a terrible metric for a genomics problem. Do not use it. This metric also goes by AUC or AUROC. Use Precision Recall AUC. Inspiration for this post I am working on a machine learning problem in genomics I was getting really confused why AUROC was so worthless scienceTwitter featuring Anshul Kundaje I want to save you (some time) What’s a ROC?