Automatically clustering large-scale miRNA sequences: methods and experiments.

Linxia Wan, Jiandong Ding, Ting Jin, Jihong Guan, Shuigeng Zhou
Author Information
  1. Linxia Wan: School of Computer Science, Fudan University, Shanghai 200433, China.

Abstract

BACKGROUND: Since the initial annotation of microRNAs (miRNAs) in 2001, many studies have sought to identify additional miRNAs experimentally or computationally in various species. MiRNAs act with the Argonaut family of proteins to regulate target messenger RNAs (mRNAs) post-transcriptionally. Currently, researches mainly focus on single miRNA function study. Considering that members in the same miRNA family might participate in the same pathway or regulate the same target(s) and thus share similar biological functions, people can explore useful knowledge from high quality miRNA family architecture.
RESULTS: In this article, we developed an unsupervised clustering-based method miRCluster to automatically group miRNAs. In order to evaluate this method, several data sets were constructed from the online database miRBase. Results showed that miRCluster can efficiently arrange miRNAs (e.g identify 354 families in miRBase16 with an accuracy of 92.08%, and can recognize 9 of all 10 newly-added families in miRBase 17). By far, ~30% mature miRNAs registered in miRBase are unclassified. With miRCluster, over 85% unclassified miRNAs can be assigned to certain families, while ~44% of these miRNAs distributed in ~300novel families.
CONCLUSIONS: In short, miRCluster is an automatic and efficient miRNA family identification method, which does not require any prior knowledge. It can be helpful in real use, especially when exploring functions of novel miRNAs. All relevant materials could be freely accessed online (http://admis.fudan.edu.cn/projects/miRCluster).

References

  1. Nat Rev Mol Cell Biol. 2005 May;6(5):376-85 [PMID: 15852042]
  2. Science. 2001 Oct 26;294(5543):858-62 [PMID: 11679671]
  3. Nucleic Acids Res. 2006 Jan 25;34(2):635-46 [PMID: 16436800]
  4. Genome Biol. 2011;12(4):221 [PMID: 21554756]
  5. Genome Biol. 2010;11(6):123 [PMID: 20565849]
  6. Genome Res. 2012 Jan;22(1):163-76 [PMID: 21940835]
  7. Nat Rev Genet. 2010 Aug;11(8):559-71 [PMID: 20628352]
  8. EMBO J. 2009 Dec 2;28(23):3646-56 [PMID: 19816405]
  9. Bioinformatics. 2008 Oct 1;24(19):2252-3 [PMID: 18713789]
  10. Cell. 2009 Feb 20;136(4):642-55 [PMID: 19239886]
  11. PLoS One. 2010 Jun 30;5(6):e11387 [PMID: 20613982]
  12. BMC Bioinformatics. 2011 May 28;12:216 [PMID: 21619662]
  13. Bioinformatics. 2007 Nov 1;23(21):2947-8 [PMID: 17846036]
  14. J Exp Bot. 2011 Mar;62(5):1611-20 [PMID: 21357774]
  15. Genes Dev. 2011 Sep 15;25(18):1881-94 [PMID: 21896651]
  16. Bioinformatics. 2009 Jun 1;25(11):1422-3 [PMID: 19304878]
  17. IEEE Trans Pattern Anal Mach Intell. 1979 Feb;1(2):164-72 [PMID: 21868845]
  18. Cell. 2009 Feb 20;136(4):669-87 [PMID: 19239888]
  19. Nat Genet. 2006 Jun;38 Suppl:S2-7 [PMID: 16736019]
  20. Mol Cell. 2003 May;11(5):1253-63 [PMID: 12769849]
  21. Science. 2000 Dec 22;290(5500):2319-23 [PMID: 11125149]
  22. Nucleic Acids Res. 2012 Jan;40(1):37-52 [PMID: 21911355]
  23. EMBO J. 2009 Dec 2;28(23):3633-4 [PMID: 19953107]
  24. Nat Biotechnol. 2008 Apr;26(4):407-15 [PMID: 18392026]
  25. Nat Genet. 2011 Mar 20;43(4):371-8 [PMID: 21423181]
  26. Nature. 2010 Jun 3;465(7298):584-9 [PMID: 20424607]
  27. Plant Cell. 2011 Feb;23(2):431-42 [PMID: 21317375]
  28. BMC Bioinformatics. 2010 Dec 14;11 Suppl 11:S11 [PMID: 21172046]
  29. Cell. 1993 Dec 3;75(5):843-54 [PMID: 8252621]
  30. Nat Biotechnol. 2008 Oct;26(10):1135-45 [PMID: 18846087]
  31. Nat Struct Mol Biol. 2011 Sep 18;18(10):1153-8 [PMID: 21926993]
  32. Proc Natl Acad Sci U S A. 2010 Jan 5;107(1):466-71 [PMID: 20018656]
  33. Mol Nutr Food Res. 2011 Aug;55(8):1219-29 [PMID: 21714127]
  34. BMC Genomics. 2011 Nov 03;12:546 [PMID: 22050702]
  35. Genome Biol. 2010;11(4):R39 [PMID: 20370911]
  36. Cell. 2009 Jan 23;136(2):215-33 [PMID: 19167326]
  37. PLoS Comput Biol. 2007 Apr 13;3(4):e65 [PMID: 17432929]
  38. PLoS Biol. 2004 May;2(5):E104 [PMID: 15024409]
  39. Nat Rev Mol Cell Biol. 2009 Feb;10(2):141-8 [PMID: 19145236]
  40. Curr Biol. 2010 Jan 12;20(1):37-41 [PMID: 20015653]
  41. Trends Plant Sci. 2002 Nov;7(11):473-5 [PMID: 12417140]
  42. Cell. 2004 Jan 23;116(2):281-97 [PMID: 14744438]
  43. RNA. 2011 Feb;17(2):312-26 [PMID: 21177881]
  44. Science. 2000 Dec 22;290(5500):2323-6 [PMID: 11125150]
  45. Nucleic Acids Res. 2011 Jan;39(Database issue):D152-7 [PMID: 21037258]

MeSH Term

Algorithms
Animals
Automation
Cluster Analysis
Databases, Genetic
Internet
MicroRNAs
Plants
User-Computer Interface
Viruses

Chemicals

MicroRNAs

Word Cloud

Created with Highcharts 10.0.0miRNAsmiRNAcanfamilymiRClusterfamiliesmethodmiRBaseidentifyregulatetargetfunctionsknowledgeonlineunclassifiedBACKGROUND:SinceinitialannotationmicroRNAs2001manystudiessoughtadditionalexperimentallycomputationallyvariousspeciesMiRNAsactArgonautproteinsmessengerRNAsmRNAspost-transcriptionallyCurrentlyresearchesmainlyfocussinglefunctionstudyConsideringmembersmightparticipatepathwaysthussharesimilarbiologicalpeopleexploreusefulhighqualityarchitectureRESULTS:articledevelopedunsupervisedclustering-basedautomaticallygrouporderevaluateseveraldatasetsconstructeddatabaseResultsshowedefficientlyarrangeeg354miRBase16accuracy9208%recognize910newly-added17far~30%matureregistered85%assignedcertain~44%distributed~300novelCONCLUSIONS:shortautomaticefficientidentificationrequirepriorhelpfulrealuseespeciallyexploringnovelrelevantmaterialsfreelyaccessedhttp://admisfudaneducn/projects/miRClusterAutomaticallyclusteringlarge-scalesequences:methodsexperiments

Similar Articles

Cited By