2019 11 09 GI2019 Personal and Medical Genomics

Nov 9, 2019 5 min read bioinformatics, gi2019, conferences, talk

Day 4 - Session 9 - PERSONAL AND MEDICAL GENOMICS
Characterization of prevalence and health consequences of uniparental disomy in four million individuals from the general population
Sub-continental ancestry inference based on the gnomAD dataset accurately classifies patients at NCH
Patient stratification in the UK’s 100k Genomes Project—Using WGS and machine learning to predict cancer outcomes
Inferring clone- and haplotype-specific chromosomal organization in rearranged cancer genomes with multiple sequencing technologies
Identification and interpretation of common and rare variants in relation to rare disease phenotype and outcome
Somatic mutation status prediction by a splicing-alteration-based machine learning technique
Beyond accessibility—ATAC-seq footprinting analysis reveals dynamics of transcription factor binding during preimplantation development

Day 4 - Session 9 - PERSONAL AND MEDICAL GENOMICS

Characterization of prevalence and health consequences of uniparental disomy in four million individuals from the general population

Priyanka Nakka, Samuel Pattillo Smith, Anne H. O’Donnell-Luria, Kimberly F. McManus, 23andMe Research Team, Joanna L. Mountain, Sohini Ramachandran, Fah Sathirapongsasuti.

@s_ramach

Paper came out two days ago AJHG

“Population-level summaries of ROH increase with divergnce from Agrica” ROH == Runs of Homozygosity (LONG runs where both chr are identical)

“Class C ROH”

don’t know what this is…something about consanguinity?
oh, >0.8MB?

Group interested in distribution across human genomes

23andMe partially sort of funded (personel) this study

Also uses 23andMe data

obv
not even MacArthur has 4 million genomes

Posting IBD karyogram….from reddit (imgur.com/a/hl6Cd)

UPD

both homologs of chr come from one parent
meiotic oopsie

3,300 reported in clinical literature

upd-tl.com/upd.html
more common on chr 6,7,11, and 15 (most common)

What is incidence in population (at least 23andMe pop)?

What are rates of maternal/paternal UPD?

Three subtypes of UPD

HetUPD, IsoUPD, Partial IsoUPD

In 916,712 duos/trios in 23andMe found 199 with UPD

Maternal UPD 3X more common than Paternal UPD
different dist across chr than the upd-tl.com data

Estimate 1/2000 rate (more common than previously estimated)

Trained log. regression classifier for each chr for each pop to diagnose ROH from UPD (instead of consanguinity)

used ROH length
ratio of ROH across chr
other things I missed

Found 304 putative UPD cases in 1.3 million “singletons”

recapitulates chr dist. patterns
found 172 more in UK biobank
found 0 in the 4,000 duos (not a big sample size)

Sub-continental ancestry inference based on the gnomAD dataset accurately classifies patients at NCH

Andrei Rajkovic, Defne Ceyhan, Greg Wheeler, Tara Lichtenberg, Ben Kelly, Peter White.

Trying to better infer ancestry with the gnomad projection

ML algorithm (gradient boosted decision tree)

trained on 4000 people (vcf input)
accuracy defined as true ancestry in top2 (is that too loose?)

90+ ish accuracy (some ancestries don’t do so well)

I guess they are relying on self-reported ancestries? Aren’t these notorious for being not so accurate….?

I asked … and yes they are …. which isn’t a problem that can be dealt with so easily

Patient stratification in the UK’s 100k Genomes Project—Using WGS and machine learning to predict cancer outcomes

Kate Ridout, Pauline Robbe, CLL pilot consortium, Anna Schuh. Presenter affiliation: University of Oxford, Oxford, United Kingdom. 231

Kaplan Meier curves

these plots always bum me out
death curve

Going through pipeline to call variants / SV / CNV in somatic tissue

LOTS of different tools
even chromHMM … not certain what datasets the chromHMM was built off of
anyways just super comprehensive, kitchen sink kind of stuff
- so many tools I wonder if they are correcting their p values for the bajillion test/tools they are doing….

showing some case studies….but why not combining all these into one framework….oh here we go I think

LASSO
non-negative matrix factorization (NMF)
- combine all features and see patterns, assess against outcomes
- n isn’t super high (200 ish)
several independent-ish models / data / thingies which get merged somehow?
- don’t quite understand architecture…
found a mut sig associated with better outcome (useful to de-prioritze for damaging chemo)

Inferring clone- and haplotype-specific chromosomal organization in rearranged cancer genomes with multiple sequencing technologies

Sergey Aganezov, Fritz Sedlazeck, Sara Goodwin, Gayatri Arun, Isac Lee, Sam Kovaka, Michael Kitsche, Rachel Scherman, Ilya Zban, Vitaly Aksenov, Nikita Alexeev, Robert Wappel, Melissa Kramer, Karen Kostroff, David L. Spector, Winston Timp, Michael S. Schatz, W. Richard McCombie.

Recover linear genome org from cancer genomes

Referring to short-read NGS as 2nd gen and long reads as 3rd gen

will this catch on? Probably not…especially if I misunderstood what happened

10x Linked Reads + ONT + PacBio

check concordance
overlap not great between all three
A TON were found in both ONT + PacBio
- LOTS of FP for 10x…not a good slide for this tech
- so maybe don’t use 10X LR for SV

Overall finding so many more SV with long reads

Get to 90% precision/recall at 25X

Tried to genotype SVs in 1000G - found a few (germline of course). Artifacts? Interesting? Dunno…

Discussing haplotype resolved chr structure technique

github.com/izban/contig-covering

Identification and interpretation of common and rare variants in relation to rare disease phenotype and outcome

Mana Gajapathy, Brandon Wilk, Arthur Weborg, Angelina Uno- Antonison, Alex Moss, Matt Holt, Donna Brown, Melissa Wilk, Camille Birch, Nadiya Sosonkina, Elizabeth Worthey

History lesson on how in 2003, “success rate” for rare genetic disease had seq. diagnosis of about 5%. Today is approaching 50%.

But common genetic disease is ~5% today

Sanger -> Exome -> WGS

Software / pipelines are curcial in making seq. useful

Using many orthogonal datasets and phenotype info (HPO?) to improve filtering

This software proprietary / custom for UAB?

varsight:

biorxiv.org/content/10.1101/532440v2
https://bmcbioinformatics.biomedcentral.com/articles/10.1186/s12859-019-3026-8

Patient with paterna isodomic UPD….3!!! different diseases with multiple causual genes! Yikes.

Somatic mutation status prediction by a splicing-alteration-based machine learning technique

Raul N. Mateos, Naoko Iida, Kenichi Chiba, Ai Okada, Yuichi Shiraishi.

NOT HERE

Model selection and permutation testing for association studies within a large direct-to-consumer genetic cohort Elena P. Sorokin, Danny S. Park, Kristin A. Rand, Julie M. Granka, Eurie L. Hong, Kenneth G. Chahine, Catherine A. Ball.

NOT TWEETABLE

Beyond accessibility—ATAC-seq footprinting analysis reveals dynamics of transcription factor binding during preimplantation development

Mette Bentsen, Philipp Goymann, Anastasiia Petrova, Hendrik Schultheis, Kathrin Klee, Jens Preussner, Carsten Kuenne, Mario Looso.