Posts

Intro Load packages Import TSV (tab-separated-value) file Plotting! Hmm, the order is not ideal Overlay points Wilcox test ggbeeswarm Themes Themes, with some tweaking of color and text dabest, one comparison dabest, multiple comparisons Conclusion Session Info Intro This is the 9th Let’s Plot…and I’ve not done a workup of the most useful plot - the boxplot. Oops. Well let’s rectify that. Load packages Many many packages.

CONTINUE READING

Load packages, pull data 2020 03 30 Update Plotter function Cases by state Cases, with log10 scaling Deaths by state (log10 scaled) Deaths by state, animated Shift plot Transform Data and plot Add exponential lines Load packages, pull data 2020 03 30 Update CSSE changed their data structure, so I’ve updated the document. I was using their “time series” data, but they dropped the US-specific (with state by state info) documents.

CONTINUE READING

2020 03 23 Update Intro Example dotplot How do I make a dotplot? But let’s do this ourself! Dotplot! Zero effort Remove dots where there is zero (or near zero expression) Better color, better theme, rotate x axis labels Tweak color scaling Now what? Hey look: ggtree Let’s glue them together with cowplot How do we do better? Two more tweak options if you are having trouble: One more adjust Moonshot Downside Exercises for the reader OLD Solution (kept for posterity) 2020 03 23 Update Ming Tang pointed out a better way to align plots, so I have rewritten the back end of this post.

CONTINUE READING

What Easy cluster by cluster Seurat FindMarkers implementation Why Because Seurat’s FindMarkers (which can be parallelized if you also load library(Future) and plan("multiprocess")) runs with cluster N against all other clusters. People kept asking me for “well what about cluster 23 vs 17” and I kept saying “uh, I haven’t run that because…” How This is being done a Mac. This may not work on a PC. Multicore stuffs are complicated.

CONTINUE READING

Day 4 - Session 9 - PERSONAL AND MEDICAL GENOMICS Characterization of prevalence and health consequences of uniparental disomy in four million individuals from the general population Sub-continental ancestry inference based on the gnomAD dataset accurately classifies patients at NCH Patient stratification in the UK’s 100k Genomes Project—Using WGS and machine learning to predict cancer outcomes Inferring clone- and haplotype-specific chromosomal organization in rearranged cancer genomes with multiple sequencing technologies Identification and interpretation of common and rare variants in relation to rare disease phenotype and outcome Somatic mutation status prediction by a splicing-alteration-based machine learning technique Beyond accessibility—ATAC-seq footprinting analysis reveals dynamics of transcription factor binding during preimplantation development Day 4 - Session 9 - PERSONAL AND MEDICAL GENOMICS Characterization of prevalence and health consequences of uniparental disomy in four million individuals from the general population Priyanka Nakka, Samuel Pattillo Smith, Anne H.

CONTINUE READING

Day 3 - Session 8 - EVOLUTION AND PHYLOGENETICS Is pathogen evolution predictable? The role of population genomics Learning the properties of adaptive regions with functional data analysis Creating pan-human and population-specific consensus representations of the reference genome and assessing their effect on functional genomic data analysis Gramene subsites—Pangenome browsers for crops What do we gain when tolerating loss? The information bottleneck, lossy compression, and detecting horizontal gene transfer A recurrent neural network for inferring sweeps and allele frequency trajectories using gene trees based on the ancestral recombination graph Assembling the Y chromosomes of anopheles mosquitoes Day 3 - Session 8 - EVOLUTION AND PHYLOGENETICS Genome Informatics 2019 at CSHL

CONTINUE READING

Day 3 - Session 7 - MICROBIAL AND METAGENOMICS Strain-level metagenomic assignment and compositional estimation for long reads with MetaMaps metaFlye—Scalable long-read metagenome assembly using repeat graphs Detecting microbial transmission and engraftment after faecal microbiota transplants using long-read metagenomics and reticulatus Entropy of a bacterial stress response is a generalizable predictor for fitness and antibiotic sensitivity The use of kmer counts to train random forests to predict country of origin for bacterial pathogen sequencing data Real-time assembly using Nanopore sequencing data for microbial communities Exploring the role of ribosomal gene repeats in the context of regeneration Genomic epidemiology of West Nile virus in California Day 3 - Session 7 - MICROBIAL AND METAGENOMICS Genome Informatics 2019 at CSHL

CONTINUE READING

Day 2 - Session 2 - SEQUENCING ALGORITHMS, VARIANT DISCOVERY AND GENOME ASSEMBLY Genomic sketching with HyperLogLog centroFlye—Assembling centromeres with long error-prone reads Genotyping structural variants in pangenome graphs using the vg toolkit Rapidly mapping raw nanopore signal with UNCALLED to enable real-time targeted sequencing The construct and utility of reference pan-genome graphs PRINCESS — A framework for comprehensive detection and phasing of SNPs and structural variants Efficient chromosome-scale haplotype-resolved assembly of human individuals Utilization of an ensemble approach for identification of driver fusions in pediatric cancer Day 2 - Session 2 - SEQUENCING ALGORITHMS, VARIANT DISCOVERY AND GENOME ASSEMBLY Genome Informatics 2019 at CSHL

CONTINUE READING

Day 2 - Session 6 - TRANSCRIPTOMICS The functional iso-transcriptomics analysis framework to assess the functional impact of alternative isoform usage Multi-resolution, interactive, atlas-scale integration of single-cell assays and experiments Efficient and robust transcriptome reconstruction from long-read RNA-seq alignments Deconvolving the pervasive transcription from jumping genes in RNA-seq and unveiling their role in tumors Alignment and mapping methodology influence transcript abundance estimation A technology-agnostic long-read analysis pipeline for transcriptome discovery and quantification Quantifying isoform expression in single-cell RNA-seq data with STARsolo-Quant Full-length transcript characterization for single cell RNA-seq analysis Day 2 - Session 6 - TRANSCRIPTOMICS Genome Informatics 2019 at CSHL

CONTINUE READING

Day 1 - Session 1 - GENOME STRUCTURE AND FUNCTION Comparative 3D genome organization in Apicomplexan parasites Unscrambling the tumor genome via integrated analysis of structural variation and copy number Tissue-specific enhancer functional networks for associating distal regulatory regions to disease Targeted Nanopore sequencing with Cas9 for studies of methylation, structural variants, and mutations Long-read sequencing of structurally variant genomes Exploring the 3D spatial dependency of gene expression using Markov random fields Mapping cis-eQTL from RNA-seq data with no genotypes Exploring short tandem repeat expansions at both known and novel loci in the human genome Day 1 - Session 1 - GENOME STRUCTURE AND FUNCTION Genome Informatics 2019 at CSHL

CONTINUE READING