- Day 3 - Session 8 - EVOLUTION AND PHYLOGENETICS
- Is pathogen evolution predictable? The role of population genomics
- Learning the properties of adaptive regions with functional data analysis
- Creating pan-human and population-specific consensus representations of the reference genome and assessing their effect on functional genomic data analysis
- Gramene subsites—Pangenome browsers for crops
- What do we gain when tolerating loss? The information bottleneck, lossy compression, and detecting horizontal gene transfer
- A recurrent neural network for inferring sweeps and allele frequency trajectories using gene trees based on the ancestral recombination graph
- Assembling the Y chromosomes of anopheles mosquitoes
Day 3 - Session 8 - EVOLUTION AND PHYLOGENETICS
Genome Informatics 2019 at CSHL
Bold is the speaker
If you dislike/disagree with my notes/sentiment and you are the speaker/PI then contact me. I very much could be mis-understanding some important points.
Is pathogen evolution predictable? The role of population genomics
William Hanage.
Is pathogen evolution predictable?
- Probably not
- Cool, talk over, I guess.
Oh he is still going
“Accessory genome”
- three e. coli strains only share 39% of the genes
- Welsh et al…….2001 (?)
- the “mr. potato head” model
Sample 4k+ pneumococcal genomes from 4 dif places
- Corander Nat Eco Evol 2017
- found 73 “sequence clusters” based on population (tree?) structure
- also 1700+ accessory genes (COGS)
Dif strains are correlated when using accessory genes (was effectively R^2 of 0 with full genome)
Negative Frequency Dependent Selection
- balancing selection
- corrolary are antigens/phage receptors - if too common get ID’ed and wiped out
Pneumococcal Conjugate vaccines (PCVs) as an evolutionary experiment
- we have good vaccines against some strains
- 7
- but there are 90+ serotypes
Can we predict what types will persist/succeed after vaccination?
Sampling (seq.) per/post vaccination
- 35 strains found
- vaccination wiped out the 7 that were covered
- uneven “replacement” rate by other strains
- not super convinced … is this just noise?
Isolates in the same group tend to have the same set of COGS (getting increasingly concerned I don’t know what COGS stands for…)
Check for prevalence of COGS pre/post vaccination and sum distance from linear expected.
- directional
Can predict fitness … and it roughly matches what you’d expect. Not perfect, but the correlation looks real.
doi.org/10.1101/420315 (<- bioRxiv)
What are the genes?! (<- his formatting)
- aaaah, he didn’t say.
- tease
Learning the properties of adaptive regions with functional data analysis
Mehreen R. Mughal, Hillary Koch, Jinguo Huang, Francesca Chiaromonte, Michael DeGiorgio.
How do organisms adapt to their environment?
- pos. selection == (over time) fixation
- idea is more beneficial == faster fixation
- selective sweep reduces diversity at haplotype
- (chunk of genome)
- so one beneficial variant can “drag” a whole lot of other variants into purity.
- free-loaders
SURFDAWave
- ha
- learn spatial dist. from statistics of sweep / neutral
- doi.org/10.1101/834010
- pick dist to match data
- penalized regression
- with wavelet coefficient
- cross-validate params to check fit
- predicts neutral/sweep pretty well in TF/TP/FN/FP
- each statistic tested gets own model
- initial freq
- selection coef
- time of selection
Now running on SLC45A2
- related to skin color in europeans
- estimated time of selection 2000 years ago with initial freq of 0.04
Adaptive introgression
- selective sweep (?)
- beneficial mutation has unique pattern
- which SURFDAWave has 52% successful classification
- kind of bleh
- can be improved with more info?
- with 2D info 52 -> 62%
Creating pan-human and population-specific consensus representations of the reference genome and assessing their effect on functional genomic data analysis
Benjamin Kaminow, Sara Ballouz, Jesse Gillis, Alex Dobin.
How much variation do we need to improve ref genome?
We can use all known…but graph is complicated.
Consensus genome
- simple
- replace minor alleles with major alleles in other pop
- have been used on variant calling
- here we assess effects of RNA-seq
ConsDB download and parse db files which makes consensus VCF and FASTA
Uses 1000G
- better pop balance
- uh, but doesn’t 1000G have loads of technical errors due to seq. errors?
- was this ever fixed……?
- got RNA-seq data from one person from 4 diff pops
- anyways….millions of replacements of SNPS
- once replacements made, then # of diff SNPs decreases from one genome <-> consensus genome
check mapping with STARconsensus
- STAR + VCF file to transform the reference
- also does compare pre/post correction
= VERY large drop in mapping error rate…(3.5 to 0.5%)- much bigger effect than I would have thought….
- wait I’m confused
- y axis says “% of reads overlapping variant”
- that’s not error rate….
Gramene subsites—Pangenome browsers for crops
Marcela K. Tello-Ruiz, Sharon Wei, Joshua Stein, Kapeel Chougule, Andrew Olson, Yinping Jiao, Bo Wang, Ivar Meijs, Doreen Ware.
Janet Kelso.
NOT TWEETABLE
What do we gain when tolerating loss? The information bottleneck, lossy compression, and detecting horizontal gene transfer
Apurva Narechania, Rob Desalle, Barun Mathema, Barry Kreiswirth, Paul Planet.
Can we do horizontal gene transfer without genes?
Make pangenome
count kmers across genomes
clusters kmers
but how do we cluster well? don’t want to lose too much info…but also want to compress…how to get the proper balance?
claude shannon!
- transmitter -> noisy channel -> receiver
- noisy channel ~ noisy clusters
- want middle ground where you can interpret but haven’t lost much info
can see how clusters stabliize with n kmers
I’m reminded of luke zappias/oshlack clustree work
- which is ironic bc this talk is all about avoiding trees
- done in 2 hours … which I guess is fast?
Plot show inflection between cluster n and norm. mutual information
q: you fixed at 19 kmer, why? a: looking at other kmer n - balance between closeness of variants and kmer n
A recurrent neural network for inferring sweeps and allele frequency trajectories using gene trees based on the ancestral recombination graph
Hussein A. Hejase, Ziyi Mo, Adam Siepel.
Assembling the Y chromosomes of anopheles mosquitoes
Chujia Chen, Austin Compton, Yang Wu, Jiangtao Liang, Dustin Miller, Xiaoguang Chen, Igor Sharakhov, Chunhong Mao, Zhijian J. Tu.