#GI2019 2019 11 08 Afternoon Session

Nov 8, 2019 5 min read bioinformatics, gi2019, conferences, talk

Day 3 - Session 8 - EVOLUTION AND PHYLOGENETICS
Is pathogen evolution predictable? The role of population genomics
Learning the properties of adaptive regions with functional data analysis
Creating pan-human and population-specific consensus representations of the reference genome and assessing their effect on functional genomic data analysis
Gramene subsites—Pangenome browsers for crops
What do we gain when tolerating loss? The information bottleneck, lossy compression, and detecting horizontal gene transfer
A recurrent neural network for inferring sweeps and allele frequency trajectories using gene trees based on the ancestral recombination graph
Assembling the Y chromosomes of anopheles mosquitoes

Day 3 - Session 8 - EVOLUTION AND PHYLOGENETICS

Genome Informatics 2019 at CSHL

Bold is the speaker

If you dislike/disagree with my notes/sentiment and you are the speaker/PI then contact me. I very much could be mis-understanding some important points.

Is pathogen evolution predictable? The role of population genomics

William Hanage.

Is pathogen evolution predictable?

Probably not
Cool, talk over, I guess.

Oh he is still going

“Accessory genome”

three e. coli strains only share 39% of the genes
- Welsh et al…….2001 (?)
the “mr. potato head” model

Sample 4k+ pneumococcal genomes from 4 dif places

Corander Nat Eco Evol 2017
found 73 “sequence clusters” based on population (tree?) structure
also 1700+ accessory genes (COGS)

Dif strains are correlated when using accessory genes (was effectively R^2 of 0 with full genome)

Negative Frequency Dependent Selection

balancing selection
corrolary are antigens/phage receptors - if too common get ID’ed and wiped out

Pneumococcal Conjugate vaccines (PCVs) as an evolutionary experiment

we have good vaccines against some strains
7
but there are 90+ serotypes

Can we predict what types will persist/succeed after vaccination?

Sampling (seq.) per/post vaccination

35 strains found
vaccination wiped out the 7 that were covered
uneven “replacement” rate by other strains
- not super convinced … is this just noise?

Isolates in the same group tend to have the same set of COGS (getting increasingly concerned I don’t know what COGS stands for…)

Check for prevalence of COGS pre/post vaccination and sum distance from linear expected.

directional

Can predict fitness … and it roughly matches what you’d expect. Not perfect, but the correlation looks real.

doi.org/10.1101/420315 (<- bioRxiv)

What are the genes?! (<- his formatting)

aaaah, he didn’t say.
tease

Learning the properties of adaptive regions with functional data analysis

Mehreen R. Mughal, Hillary Koch, Jinguo Huang, Francesca Chiaromonte, Michael DeGiorgio.

How do organisms adapt to their environment?

pos. selection == (over time) fixation
idea is more beneficial == faster fixation
selective sweep reduces diversity at haplotype
- (chunk of genome)
- so one beneficial variant can “drag” a whole lot of other variants into purity.
  - free-loaders

SURFDAWave

ha
learn spatial dist. from statistics of sweep / neutral
doi.org/10.1101/834010
pick dist to match data
penalized regression
- with wavelet coefficient
- cross-validate params to check fit
predicts neutral/sweep pretty well in TF/TP/FN/FP
each statistic tested gets own model
- initial freq
- selection coef
- time of selection

Now running on SLC45A2

related to skin color in europeans
estimated time of selection 2000 years ago with initial freq of 0.04

Adaptive introgression

selective sweep (?)
beneficial mutation has unique pattern
which SURFDAWave has 52% successful classification
- kind of bleh
- can be improved with more info?
- with 2D info 52 -> 62%

Creating pan-human and population-specific consensus representations of the reference genome and assessing their effect on functional genomic data analysis

Benjamin Kaminow, Sara Ballouz, Jesse Gillis, Alex Dobin.

How much variation do we need to improve ref genome?

We can use all known…but graph is complicated.

Consensus genome

simple
replace minor alleles with major alleles in other pop
have been used on variant calling
here we assess effects of RNA-seq

ConsDB download and parse db files which makes consensus VCF and FASTA

Uses 1000G

better pop balance
uh, but doesn’t 1000G have loads of technical errors due to seq. errors?
- was this ever fixed……?
got RNA-seq data from one person from 4 diff pops
anyways….millions of replacements of SNPS
once replacements made, then # of diff SNPs decreases from one genome <-> consensus genome

check mapping with STARconsensus

STAR + VCF file to transform the reference
also does compare pre/post correction
= VERY large drop in mapping error rate…(3.5 to 0.5%)
- much bigger effect than I would have thought….
wait I’m confused
- y axis says “% of reads overlapping variant”
- that’s not error rate….

Gramene subsites—Pangenome browsers for crops

Marcela K. Tello-Ruiz, Sharon Wei, Joshua Stein, Kapeel Chougule, Andrew Olson, Yinping Jiao, Bo Wang, Ivar Meijs, Doreen Ware.

Janet Kelso.

NOT TWEETABLE

What do we gain when tolerating loss? The information bottleneck, lossy compression, and detecting horizontal gene transfer

Apurva Narechania, Rob Desalle, Barun Mathema, Barry Kreiswirth, Paul Planet.

Can we do horizontal gene transfer without genes?

Make pangenome

count kmers across genomes

clusters kmers

but how do we cluster well? don’t want to lose too much info…but also want to compress…how to get the proper balance?

claude shannon!

transmitter -> noisy channel -> receiver
noisy channel ~ noisy clusters
want middle ground where you can interpret but haven’t lost much info

can see how clusters stabliize with n kmers

I’m reminded of luke zappias/oshlack clustree work

which is ironic bc this talk is all about avoiding trees
done in 2 hours … which I guess is fast?

Plot show inflection between cluster n and norm. mutual information

q: you fixed at 19 kmer, why? a: looking at other kmer n - balance between closeness of variants and kmer n

A recurrent neural network for inferring sweeps and allele frequency trajectories using gene trees based on the ancestral recombination graph

Hussein A. Hejase, Ziyi Mo, Adam Siepel.

Assembling the Y chromosomes of anopheles mosquitoes

Chujia Chen, Austin Compton, Yang Wu, Jiangtao Liang, Dustin Miller, Xiaoguang Chen, Igor Sharakhov, Chunhong Mao, Zhijian J. Tu.

bioinformatics gi2019 conferences talk