Finding Diagnostically Useful Patterns in Quantitative Phenotypic Data.
Stuart Aitken, Helen V Firth, Jeremy McRae, Mihail Halachev, Usha Kini, Michael J Parker, Melissa M Lees, Katherine Lachlan, Ajoy Sarkar, Shelagh Joss, Miranda Splitt, Shane McKee, Andrea H Németh, Richard H Scott, Caroline F Wright, Joseph A Marsh, Matthew E Hurles, David R FitzPatrick, DDD Study
Author Information
Stuart Aitken: MRC Human Genetics Unit, Institute of Genetic and Molecular Medicine, University of Edinburgh, Edinburgh EH4 2XU, UK.
Helen V Firth: Wellcome Sanger Institute, Hinxton, Cambridgeshire CB10 1SA, UK; Clinical Genetic Department, Addenbrooke's Hospital Cambridge University Hospitals, Cambridge, UK.
Jeremy McRae: Wellcome Sanger Institute, Hinxton, Cambridgeshire CB10 1SA, UK.
Mihail Halachev: MRC Human Genetics Unit, Institute of Genetic and Molecular Medicine, University of Edinburgh, Edinburgh EH4 2XU, UK; South East Scotland Regional Genetics Services, Western General Hospital, Edinburgh, UK.
Usha Kini: Oxford Centre for Genomic Medicine, Oxford University Hospitals NHS Foundation Trust, Oxford, UK.
Michael J Parker: Sheffield Children's Hospital NHS Foundation Trust, Western Bank, Sheffield, UK.
Melissa M Lees: North East Thames Regional Genetics Service, Great Ormond Street Hospital for Children NHS Foundation Trust, London WC1N 3EH, UK.
Katherine Lachlan: Wessex Clinical Genetics Service, University Hospitals of Southampton NHS Trust, Southampton, UK.
Ajoy Sarkar: Nottingham Regional Genetics Service, City Hospital Campus, Nottingham University Hospitals NHS Trust, The Gables, Hucknall Road, Nottingham NG5 1PB, UK.
Shelagh Joss: West of Scotland Regional Genetics Service, Queen Elizabeth University Hospital, Glasgow G51 4TF, UK.
Miranda Splitt: Northern Genetics Service, Newcastle upon Tyne Hospitals NHS Foundation Trust, Newcastle upon Tyne, UK.
Andrea H Németh: Oxford Centre for Genomic Medicine, Oxford University Hospitals NHS Foundation Trust, Oxford, UK; Nuffield Department of Clinical Neurosciences, University of Oxford, Oxford, UK; Oxford Centre for Genomic Medicine, Oxford University Hospitals National Health Service Foundation Trust, Oxford, UK.
Richard H Scott: North East Thames Regional Genetics Service, Great Ormond Street Hospital for Children NHS Foundation Trust, London WC1N 3EH, UK.
Caroline F Wright: University of Exeter Medical School, RILD Level 4, Royal Devon & Exeter Hospital, Barrack Road, Exeter, UK.
Joseph A Marsh: MRC Human Genetics Unit, Institute of Genetic and Molecular Medicine, University of Edinburgh, Edinburgh EH4 2XU, UK.
Matthew E Hurles: Wellcome Sanger Institute, Hinxton, Cambridgeshire CB10 1SA, UK.
David R FitzPatrick: MRC Human Genetics Unit, Institute of Genetic and Molecular Medicine, University of Edinburgh, Edinburgh EH4 2XU, UK. Electronic address: david.fitzpatrick@ed.ac.uk.
: Wellcome Sanger Institute, Hinxton, Cambridgeshire CB10 1SA, UK.
Trio-based whole-exome sequence (WES) data have established confident genetic diagnoses in ∼40% of previously undiagnosed individuals recruited to the Deciphering Developmental Disorders (DDD) study. Here we aim to use the breadth of phenotypic information recorded in DDD to augment diagnosis and disease variant discovery in probands. Median Euclidean distances (mEuD) were employed as a simple measure of similarity of quantitative phenotypic data within sets of ≥10 individuals with plausibly causative de novo mutations (DNM) in 28 different developmental disorder genes. 13/28 (46.4%) showed significant similarity for growth or developmental milestone metrics, 10/28 (35.7%) showed similarity in HPO term usage, and 12/28 (43%) showed no phenotypic similarity. Pairwise comparisons of individuals with high-impact inherited variants to the 32 individuals with causative DNM in ANKRD11 using only growth z-scores highlighted 5 likely causative inherited variants and two unrecognized DNM resulting in an 18% diagnostic uplift for this gene. Using an independent approach, naive Bayes classification of growth and developmental data produced reasonably discriminative models for the 24 DNM genes with sufficiently complete data. An unsupervised naive Bayes classification of 6,993 probands with WES data and sufficient phenotypic information defined 23 in silico syndromes (ISSs) and was used to test a "phenotype first" approach to the discovery of causative genotypes using WES variants strictly filtered on allele frequency, mutation consequence, and evidence of constraint in humans. This highlighted heterozygous de novo nonsynonymous variants in SPTBN2 as causative in three DDD probands.