Interspecific comparison of gene expression profiles using machine learning.

Artem S Kasianov, Anna V Klepikova, Alexey V Mayorov, Gleb S Buzanov, Maria D Logacheva, Aleksey A Penin
Author Information
  1. Artem S Kasianov: Institute for Information Transmission Problems of the Russian Academy of Sciences, Moscow, Russia. ORCID
  2. Anna V Klepikova: Institute for Information Transmission Problems of the Russian Academy of Sciences, Moscow, Russia. ORCID
  3. Alexey V Mayorov: Institute for Information Transmission Problems of the Russian Academy of Sciences, Moscow, Russia. ORCID
  4. Gleb S Buzanov: Moscow Institute of Physics and Technology, Moscow, Russia.
  5. Maria D Logacheva: Institute for Information Transmission Problems of the Russian Academy of Sciences, Moscow, Russia.
  6. Aleksey A Penin: Institute for Information Transmission Problems of the Russian Academy of Sciences, Moscow, Russia.

Abstract

Interspecific gene comparisons are the keystones for many areas of biological research and are especially important for the translation of knowledge from model organisms to economically important species. Currently they are hampered by the low resolution of methods based on sequence analysis and by the complex evolutionary history of eukaryotic genes. This is especially critical for plants, whose genomes are shaped by multiple whole genome duplications and subsequent gene loss. This requires the development of new methods for comparing the functions of genes in different species. Here, we report ISEEML (Interspecific Similarity of Expression Evaluated using Machine Learning)-a novel machine learning-based algorithm for interspecific gene classification. In contrast to previous studies focused on sequence similarity, our algorithm focuses on functional similarity inferred from the comparison of gene expression profiles. We propose novel metrics for expression pattern similarity-expression score (ES)-that is suitable for species with differing morphologies. As a proof of concept, we compare detailed transcriptome maps of Arabidopsis thaliana, the model species, Zea mays (maize) and Fagopyrum esculentum (common buckwheat), which are species that represent distant clades within flowering plants. The classifier resulted in an AUC of 0.91; under the ES threshold of 0.5, the specificity was 94%, and sensitivity was 72%.

References

  1. Mol Biol Evol. 2006 Mar;23(3):530-40 [PMID: 16280543]
  2. Mol Biol Evol. 2021 Mar 9;38(3):1209-1224 [PMID: 33045078]
  3. Annu Rev Genet. 2005;39:309-38 [PMID: 16285863]
  4. Proc Natl Acad Sci U S A. 2013 Oct 22;110(43):17409-14 [PMID: 24101476]
  5. Plants (Basel). 2019 Aug 28;8(9): [PMID: 31466308]
  6. Proc Natl Acad Sci U S A. 2019 Dec 26;116(52):27151-27158 [PMID: 31822622]
  7. PLoS Comput Biol. 2016 Dec 28;12(12):e1005274 [PMID: 28030541]
  8. Plant Physiol. 2015 Aug;168(4):1830-43 [PMID: 26045464]
  9. Bioinformatics. 2013 Jan 1;29(1):15-21 [PMID: 23104886]
  10. PLoS Comput Biol. 2012;8(5):e1002514 [PMID: 22615551]
  11. Nature. 2019 Jul;571(7766):505-509 [PMID: 31243369]
  12. BMC Bioinformatics. 2011 Apr 28;12:124 [PMID: 21526987]
  13. Bioinformatics. 2006 Jul 1;22(13):1616-22 [PMID: 16595558]
  14. Plant Commun. 2020 Feb 04;1(2):100027 [PMID: 33367231]
  15. BMC Evol Biol. 2015 Jul 15;15:138 [PMID: 26173681]
  16. J Plant Res. 2019 Jan;132(1):19-31 [PMID: 30623282]
  17. Genes (Basel). 2019 Jan 15;10(1): [PMID: 30650673]
  18. Plant Cell Environ. 2012 Oct;35(10):1787-98 [PMID: 22489681]
  19. Plant Physiol. 2016 Aug;171(4):2343-57 [PMID: 27303025]
  20. Nat Methods. 2016 May;13(5):425-30 [PMID: 27043882]
  21. Genome Biol Evol. 2014 Apr;6(4):754-62 [PMID: 24610837]
  22. Genome Res. 2002 Jul;12(7):1048-59 [PMID: 12097341]
  23. Front Genet. 2019 Nov 12;10:1077 [PMID: 31781160]
  24. PLoS Genet. 2017 Sep 15;13(9):e1006997 [PMID: 28915238]
  25. Plant Cell. 2022 Jul 4;34(7):2466-2474 [PMID: 35253876]
  26. Plant Cell. 2011 Mar;23(3):895-910 [PMID: 21441431]
  27. Plant J. 2018 Jan;93(2):338-354 [PMID: 29161754]
  28. Mol Ecol. 2019 Dec;28(23):5103-5114 [PMID: 31614039]
  29. Nucleic Acids Res. 2018 Jul 2;46(W1):W133-W140 [PMID: 29718322]
  30. Sci Rep. 2019 Mar 1;9(1):3224 [PMID: 30824779]
  31. Plant Genome. 2016 Mar;9(1): [PMID: 27898762]
  32. Genome Biol. 2010;11(10):R106 [PMID: 20979621]
  33. Nat Commun. 2021 Sep 24;12(1):5627 [PMID: 34561450]
  34. F1000Res. 2019 Feb 5;8: [PMID: 30800290]
  35. Genetics. 2004 Feb;166(2):1011-23 [PMID: 15020484]
  36. New Phytol. 2017 May;214(3):1338-1354 [PMID: 28294342]
  37. Genome Biol. 2019 Nov 14;20(1):238 [PMID: 31727128]
  38. Front Plant Sci. 2021 Mar 16;12:612382 [PMID: 33815435]
  39. Gigascience. 2018 Jul 1;7(7): [PMID: 30010758]
  40. Plant J. 2012 Sep;71(6):1038-50 [PMID: 22607031]
  41. Proc Natl Acad Sci U S A. 1989 Aug;86(16):6201-5 [PMID: 2762323]
  42. Plant Genome. 2017 Mar;10(1): [PMID: 28464063]
  43. Nature. 2021 Feb;590(7845):284-289 [PMID: 33461212]
  44. J Mol Evol. 2004 Apr;58(4):424-41 [PMID: 15114421]
  45. Genome Res. 2017 Sep;27(9):1461-1474 [PMID: 28743766]
  46. NPJ Syst Biol Appl. 2022 Feb 23;8(1):9 [PMID: 35197482]
  47. Nature. 2018 Feb 22;554(7693):555-557 [PMID: 29469107]
  48. Plant J. 2016 Dec;88(6):1058-1070 [PMID: 27549386]
  49. PLoS Comput Biol. 2011 Feb 03;7(2):e1001074 [PMID: 21304936]
  50. BMC Evol Biol. 2020 Jul 17;20(1):87 [PMID: 32680460]

MeSH Term

Transcriptome
Arabidopsis
Biological Evolution
Gene Expression Regulation, Plant
Zea mays

Word Cloud

Created with Highcharts 10.0.0genespeciesInterspecificexpressionespeciallyimportantmodelmethodssequencegenesplantsusingnovelmachinealgorithmsimilaritycomparisonprofilesES0comparisonskeystonesmanyareasbiologicalresearchtranslationknowledgeorganismseconomicallyCurrentlyhamperedlowresolutionbasedanalysiscomplexevolutionaryhistoryeukaryoticcriticalwhosegenomesshapedmultiplewholegenomeduplicationssubsequentlossrequiresdevelopmentnewcomparingfunctionsdifferentreportISEEMLSimilarityExpressionEvaluatedMachineLearning-alearning-basedinterspecificclassificationcontrastpreviousstudiesfocusedfocusesfunctionalinferredproposemetricspatternsimilarity-expressionscore-thatsuitabledifferingmorphologiesproofconceptcomparedetailedtranscriptomemapsArabidopsisthalianaZeamaysmaizeFagopyrumesculentumcommonbuckwheatrepresentdistantcladeswithinfloweringclassifierresultedAUC91threshold5specificity94%sensitivity72%learning

Similar Articles

Cited By