#GI2019 2019 11 06

Day 1 - Session 1 - GENOME STRUCTURE AND FUNCTION

Genome Informatics 2019 at CSHL

Bold is the speaker

349 attendees. Largest?

189 posters

39 (?) talks

90 countries, 20 (US) states

High(est) % female, URM for gi

If you dislike/disagree with my notes/sentiment and you are the speaker/PI then contact me. I very much could be mis-understanding some important points.

Comparative 3D genome organization in Apicomplexan parasites

Evelien M. Bunnik, Aarthi Venkat, Ferhat Ay, Karine G. Le Roch.

Apicomplexan parasites:

  • toxoplasma gondii
  • babesia
  • most prevalent nad morbidity causing pathogens

Try to understand at genomic level

Malaria (plasmodium) genome published 18 years ago

  • 82% AT rich
  • grab bag of major -omics techniques
  • still poor understanding of gene regulation
  • believe under-representation of TFs identified (lower rate than human, yeast)

Did Hi-C (chrmatin conformation capture)

  • at many stages
  • 100kb resolution (at least for later work?)
  • going through their literature of chromosome organization (Ay, Bunnik Genome Research 2014)

Virulence Genes

  • mediates adherence of infected RGC to endothelium of infected RBC
  • altered chromatin structure at some of these genes

More examples of loops for other pathogenesis processes

  • see power of doing many stages (stage specific patterning of contact maps)

Did more Hi-C across other parasites, unfortunately had to re-assemble several genomes

  • Bunnik PNAS 2019
  • more expression at genes near (in 3D space) telomore
  • which is where many virulence genes are …

Unscrambling the tumor genome via integrated analysis of structural variation and copy number

Daniel L. Cameron, Charles Shale, Jonathan Baber, Anthony T. Papenfuss, Peter Priestly.

Hartwig has a fully automated oncology report (“no touch”)

  • needs to be highly sensitive and specific
  • use 110x tumor, 45x germline
  • SV and CNV calling not ideal

GRIDSS -> PURPLE -> LINX

SV-> CNV -> Interpretation

https://github.com/PapenfussLab/gridss

full pipeline: https://github.com/hartwigmedical/gridss-purple-linx

work on phasing SV (cis vs trans vs unknown)

  • only good up to about 300bp
  • but events are clustered, so works better than you’d expect

Benchmarking (includes GRIDSS)

Tissue-specific enhancer functional networks for associating distal regulatory regions to disease

Xi Chen, Jian Zhou, Chandra L. Theesfeld, Ran Zhang, Aaron K. Wong, Olga G. Troyanskaya.

How to predict function of enhancers?

Different ways enhancers associated with disease:

  • enhancer -> gene (near)
  • enhancer(with SNP inside)
  • enhancer(rare mutation inside)
  • enhance without direct evidence

Hypothesis:

  • enhancer will have a functional relationship with another enhancer if:
    • 3d interaction
    • co-activation with TF
    • regulate related genes
  • if all met, then maybe disease associated

bayesian inference to model the above assumptions

how to see if this sort of works?

  • check brain specific super enhancers in non-brain tissues
  • get some enrichment

https://fenrir.flatironinstitute.org

  • web tool
  • no preprint or paper?

dmcg note:

  • seems like you’d really want some en-masse validation. validating a few sites doesn’t seem like enough.
  • i missed what the input data for these models are (public stuff? Internal stuffs?)

Targeted Nanopore sequencing with Cas9 for studies of methylation, structural variants, and mutations

Timothy E. Gilpatrick, Isac Lee, Fritz J. Sedlazek, Winston Timp.

Want to enrich regions for genome

ROI -> cut out with CRISPR (cas9 on both sides of ROI) -> dA tail -> adapter ligate

  • oh even cutting just once helps with enrichment. cool.
  • can target 10 sites simulateously (400X+ coverage with MinION)

Tried with cancer model

On-target perentage (reads) is about 1-4%

  • most off-target randomly distributed
  • very little off-target cutting

5% error rate!

  • homopolymers sucks
  • guppy v3 does better job at SNV calling

did downsampling test to check SNV performance

  • tools
    • mpileup
    • Nanopolish (Simpson et al. ?)
    • Medaka (ONT)
    • Clair (Luo et al.)
  • can get 95% sesnitivity…doesn’t seem to great
  • Clair gets worse with more coverage (trained on 25X apparently)

HE IS GOING SO FAST OMG

More discussion of benchmarking - still in SNV calling

Now we are talking phasing

  • in TP53 find all true variants, no FP

Suggestion of CNV with phased data, more copies of one allele (60X vs 20X)

Now CpG

  • enriched stuff can still get detectable CpG calls

SV & Repetitive

  • still works

dmcg note:

  • everything still works (isn’t that expected?)
  • on-target percentage VERY low (1-4% means you are “throwing” lots of data away)
  • and yield is getting hammered…so doubly bad
  • not doing anything like getting hundreds of genes…which I would think would be really necessary to be useful clinically
  • size smaller? (length of DNA reads)

Long-read sequencing of structurally variant genomes

Evan E. Eichler.

4% duplication

49% of normal CNV maps to this 4% of genome

Long reads increase resolution of SV

SV detection comaprison between Illumina and PacBio (30k with PB, 11k with Illumina)

  • 70% of SV missed with just Illumina data

Did SV analysis with LR seq method across diverse human genomes

  • found near 100k resolved SV
  • exciting to see a plateau of novel SV! (east africa most unique, as expected)

Improve SV calling with short read data with improved genomes

  • can find new eQTL

Goal: t2t (telomere 2 telomere for chromosome)

  • t2t consortium

Solution:

  • HiFi PacBio 99.9% acccurate 18kbp
  • ONT gives you ultra long (whale!) reads

HiFi & Strandseq(?) to build referene-free phased human genomes

hifi & canu -> saarclust & strandseq -> deepvariant -> whatshap & strandseq -> canu/peregrine

assemboy -> contigs -> snp calls -> haplotype CCS reads -> assembly

23 chr (“clusters”)

95% SNP phase

N50 = 28 Mbp

problematic regions

  • canu and peregrine fail
  • 250 in count
  • 60% are supplemental dups
  • 18% SV
  • 23% acrocentric

solutions

  • build graghs around high dup regions of human genome
  • “SDA” assembler

Long reads getting centrome seq of chr8 (t2t consortium)

Exploring the 3D spatial dependency of gene expression using Markov random fields

Naihui Zhou, Iddo Friedberg, Mark S. Kaiser.

The quantitative expression of gene regulation could manifest as spatial dependency

Hypothesis:

  • TF factories (clustering of genes for active transcription)
  • hub-enhancers (shared enhancers for many genes)

Propose adding spatial location of genes to improve modeeling of gene-gene variation (which is already modelled in most diff expression analyses, e.g. limma)

Poisson Hierarchical Markov Random Field Model (PHiMRF)

  • uses HiC data to get pairwise gene interactions
  • two genes called neighbors if HiC interaction > threshold
  • making a spatial gene network
  • genes are nodes, edges are HiC interaction

New random variable Yik which follows binomial dist (? missed this)

Do genes in TAD have spatial dependencies?

  • but how to deal with interTAD and intraTAD (within and between)?
  • run model with both types of edges
  • Yes…more connected than bootstraping tests

davemcg note: Still waiting to see if/how this improves diff expression?

Enriched GO terms for most conncted gene groups

  • tx by RNA polII
  • GPCR signalling
  • negative cell proliferation

R package! https://github.com/ashleyzhou972/PhiMRF

Mapping cis-eQTL from RNA-seq data with no genotypes

Elena Vigorito, Simon White, Chris Wallace.

Can’t always get genotypes….so would like eQTL without them….

Impute cis-SNP (Gi)

negative binomial weighted by probable genotype

work in 500kb window to compute gene - SNP pairings

benchmark against GEUVADIS reference data

  • only chr22 (why? performance?)
  • better sensitivity and specificity

R package out “soon”

only method for eQTL without genotypes

Exploring short tandem repeat expansions at both known and novel loci in the human genome

Harriet Dashnow, Brent Pedersen, Daniel MacArthur, Alicia Oshlack, Aaron Quinlan.

@hdashnow (twitter handle! good!)

STR is 1-6bp repeats. Microsatellites.

3% of human genome with high mutation rate and high polymorphism

Huntington’s, fragile X, spinocerebellar ataxias

Classically, longer is worse prognosis / higher severity

STR alleles are WAY longer than short reads

Harriet looking for NOVEL (str NOT IN REFERENCE GENOME)

  • not so many methods fornovel (just ExpansionHunter? as of 2 months ago)

STR reads often mis-mapped

  • read STR…where does it go???
  • sends it to just whatever STR or nowhere if really novel

STRling!

Validated acdross 9 (7 worked) disease expansions

Found 5/5 novel expansions in CANVAS disease

Also validation with PacBio

Limitations! (yay! seriously love this)

  • limited to large (>50 bp) STR expansions

Related

comments powered by Disqus