R

Let's Plot 9: The venerable box plot

Intro Load packages Import TSV (tab-separated-value) file Plotting! Hmm, the order is not ideal Overlay points Wilcox test ggbeeswarm Themes Themes, with some tweaking of color and text dabest, one comparison dabest, multiple comparisons Conclusion Session Info Intro This is the 9th Let’s Plot…and I’ve not done a workup of the most useful plot - the boxplot. Oops. Well let’s rectify that. Load packages Many many packages.

Let's Plot 8: (Animated) US State Covid-19 Case Count

Load packages, pull data 2020 03 30 Update Plotter function Cases by state Cases, with log10 scaling Deaths by state (log10 scaled) Deaths by state, animated Shift plot Transform Data and plot Add exponential lines Load packages, pull data 2020 03 30 Update CSSE changed their data structure, so I’ve updated the document. I was using their “time series” data, but they dropped the US-specific (with state by state info) documents.

Let's Plot 7: Clustered Dot Plots in the ggverse

2020 03 23 Update Intro Example dotplot How do I make a dotplot? But let’s do this ourself! Dotplot! Zero effort Remove dots where there is zero (or near zero expression) Better color, better theme, rotate x axis labels Tweak color scaling Now what? Hey look: ggtree Let’s glue them together with cowplot How do we do better? Two more tweak options if you are having trouble: One more adjust Moonshot Downside Exercises for the reader OLD Solution (kept for posterity) 2020 03 23 Update Ming Tang pointed out a better way to align plots, so I have rewritten the back end of this post.

Seurat FindMarker with Cluster N vs M

What Easy cluster by cluster Seurat FindMarkers implementation Why Because Seurat’s FindMarkers (which can be parallelized if you also load library(Future) and plan("multiprocess")) runs with cluster N against all other clusters. People kept asking me for “well what about cluster 23 vs 17” and I kept saying “uh, I haven’t run that because…” How This is being done a Mac. This may not work on a PC. Multicore stuffs are complicated.

One Developer Portal: eyeIntegration Genesis

News! eyeIntegration version 1.0 went live early this year (2019-01-16) and recently was accepted for publication in IOVS. In celebration of the news, I’m posting a small series of posts about the genesis, development, upgrades, and future of eyeIntegration. You can find our latest manuscript on bioRxiv. The latest update should go live soon. Background eyeIntegration was developed to serve as a quick and easy to use normal gene expression portal in eye tissues.

One Developer Portal: eyeIntegration Web Optimization

This post is a continuation from here. Really important stuff I learned to make a performant web site in Shiny After a few months of tinkering I had a working web app on my local computer, which is a 32GB of RAM, 1TB SSD Mac Pro trashcan. All of the data objects were .Rdata, which were load() when the site was initialized. This was fine in the beginning and in fact the shiny site was deployed with this structure in May of 2017.

Let's Plot 6: Simple guide to heatmaps with ComplexHeatmaps

Introduction Data processing Load data Peek at expression Peek at metadata Brief outline on how the RNA-seq data was processed before we see it Load libraries Create a Sample - Sample distance heatmap Easy heatmap with ComplexHeatmap Complex heatmap Finished heatmap Gene Heatmaps A bit simpler Session Info Introduction Heatmaps are a core competency for a bioinformatician. They are a compact way to visually demonstrate relationships and changes in values across conditions.

Let’s Plot 5: ridgeline density plots

Intro For this installment of Let’s Plot (where anyone can make a figure!), we’ll be making the hottest visualization of 2017 - the joy plot or ridgeline plot. Joy plots are partially overlapping density line plots. They are useful for densely showing changes in many distributions over time / condition / etc. This type of visualization was inspired by the cover art from Joy Division’s album Unknown Pleasures and implemented in the R package ggridges by Claus Wilke.

Are you in genomics and building models? Stop using ROC - use PR

tldr Area Under the Curve (AUC) of Receiver Operating Characteristic (ROC) is a terrible metric for a genomics problem. Do not use it. This metric also goes by AUC or AUROC. Use Precision Recall AUC. Inspiration for this post I am working on a machine learning problem in genomics I was getting really confused why AUROC was so worthless scienceTwitter featuring Anshul Kundaje I want to save you (some time) What’s a ROC?

Something Different: Automated Neighborhood Traffic Monitoring

title: ‘Something Different: Automated Neighborhood Traffic Monitoring’ author: David McGaughey date: ‘2018-03-03’ slug: traffic-monitoring-intro categories: - R - python - raspberry - pi tags: - R - python - raspberry - pi — Introduction This is, obviously, a personal project. Traffic is a concern in my town. Cut-through, speeding, etc. The town has paid for a couple of (very expensive!) traffic surveys, but the reports are not very useful as the company only sets up in town for a few days (if that) and then only reports stuff like ‘number of cars for a one hour period.