MTAP: the motif tool assessment platform.

Daniel Quest, Kathryn Dempsey, Mohammad Shafiullah, Dhundy Bastola, Hesham Ali
Author Information
  1. Daniel Quest: College of Information Science & Technology, University of Nebraska at Omaha, Omaha NE, USA. djquest@unmc.edu

Abstract

BACKGROUND: In recent years, substantial effort has been applied to de novo regulatory motif discovery. At this time, more than 150 software tools exist to detect regulatory binding sites given a set of genomic sequences. As the number of software packages increases, it becomes more important to identify the tools with the best performance characteristics for specific problem domains. Identifying the correct tool is difficult because of the great variability in motif detection software. Consequently, many labs spend considerable effort testing methods to find one that works well in their problem of interest.
RESULTS: In this work, we propose a method (MTAP) that substantially reduces the effort required to assess de novo regulatory motif discovery software. MTAP differs from previous attempts at regulatory motif assessment in that it automates motif discovery tool pipelines (something that traditionally required many manual steps), automatically constructs orthologous upstream sequences, and provides automated benchmarks for many popular tools. As a proof of concept, we have run benchmarks over human, mouse, fly, yeast, E. coli and B. subtilis.
CONCLUSION: MTAP presents a new approach to the challenging problem of assessing regulatory motif discovery methods. The most current version of MTAP can be downloaded from http://biobase.ist.unomaha.edu/

References

  1. Bioinformatics. 1999 Jul-Aug;15(7-8):563-77 [PMID: 10487864]
  2. BMC Bioinformatics. 2007 Feb 07;8:46 [PMID: 17286865]
  3. Nucleic Acids Res. 2004 Jan 1;32(Database issue):D303-6 [PMID: 14681419]
  4. BMC Bioinformatics. 2006 Jul 13;7:342 [PMID: 16839417]
  5. Nucleic Acids Res. 2007 Jan;35(Database issue):D407-12 [PMID: 17142223]
  6. Bioinformatics. 2005 Jun 15;21(12):2909-11 [PMID: 15814553]
  7. Nucleic Acids Res. 2006 Jul 1;34(Web Server issue):W369-73 [PMID: 16845028]
  8. J Bioinform Comput Biol. 2004 Mar;2(1):127-54 [PMID: 15272436]
  9. J Mol Biol. 1985 Nov 5;186(1):117-28 [PMID: 3908689]
  10. BMC Bioinformatics. 2004 Oct 28;5:170 [PMID: 15511292]
  11. Nucleic Acids Res. 1996 Jan 1;24(1):238-41 [PMID: 8594589]
  12. Pac Symp Biocomput. 2000;:467-78 [PMID: 10902194]
  13. Nucleic Acids Res. 2005 Sep 02;33(15):4899-913 [PMID: 16284194]
  14. PLoS Comput Biol. 2007 Sep;3(9):1739-50 [PMID: 17845071]
  15. BMC Bioinformatics. 2008 Feb 26;9:123 [PMID: 18302777]
  16. Nucleic Acids Res. 2003 Jul 1;31(13):3580-5 [PMID: 12824370]
  17. Pac Symp Biocomput. 2001;:127-38 [PMID: 11262934]
  18. BMC Bioinformatics. 2007 Jun 08;8:193 [PMID: 17559676]
  19. Nucleic Acids Res. 2004 Jan 02;32(1):189-200 [PMID: 14704356]
  20. Comput Appl Biosci. 1990 Apr;6(2):81-92 [PMID: 2193692]
  21. Nucleic Acids Res. 2003 Jan 1;31(1):266-9 [PMID: 12519998]
  22. Bioinformatics. 2001;17 Suppl 1:S207-14 [PMID: 11473011]
  23. Proc Int Conf Intell Syst Mol Biol. 1994;2:28-36 [PMID: 7584402]
  24. Nucleic Acids Res. 2003 Jul 1;31(13):3487-90 [PMID: 12824350]
  25. Bioinformatics. 2003 Sep 1;19(13):1710-1 [PMID: 15593400]
  26. Pac Symp Biocomput. 2005;:483-94 [PMID: 15759653]
  27. PLoS Comput Biol. 2005 Dec;1(7):e67 [PMID: 16477324]
  28. Nat Biotechnol. 2005 Jan;23(1):137-44 [PMID: 15637633]
  29. Nucleic Acids Res. 2001 Jan 1;29(1):278-80 [PMID: 11125112]
  30. Science. 1993 Oct 8;262(5131):208-14 [PMID: 8211139]
  31. J Mol Biol. 2000 Mar 10;296(5):1205-14 [PMID: 10698627]
  32. Nucleic Acids Res. 2004 Jul 1;32(Web Server issue):W199-203 [PMID: 15215380]

Grants

  1. P20 RR016469/NCRR NIH HHS
  2. P20RR16469/NCRR NIH HHS

MeSH Term

Algorithms
Base Sequence
Molecular Sequence Data
Regulatory Sequences, Nucleic Acid
Sequence Analysis, DNA
Software

Word Cloud

Created with Highcharts 10.0.0motifregulatorydiscoverysoftwareMTAPefforttoolsproblemtoolmanydenovosequencesmethodsrequiredassessmentbenchmarksBACKGROUND:recentyearssubstantialappliedtime150existdetectbindingsitesgivensetgenomicnumberpackagesincreasesbecomesimportantidentifybestperformancecharacteristicsspecificdomainsIdentifyingcorrectdifficultgreatvariabilitydetectionConsequentlylabsspendconsiderabletestingfindoneworkswellinterestRESULTS:workproposemethodsubstantiallyreducesassessdifferspreviousattemptsautomatespipelinessomethingtraditionallymanualstepsautomaticallyconstructsorthologousupstreamprovidesautomatedpopularproofconceptrunhumanmouseflyyeastEcoliBsubtilisCONCLUSION:presentsnewapproachchallengingassessingcurrentversioncandownloadedhttp://biobaseistunomahaedu/MTAP:platform

Similar Articles

Cited By