MultiPhyl: a high-throughput phylogenomics webserver using distributed computing.

Thomas M Keane, Thomas J Naughton, James O McInerney
Author Information
  1. Thomas M Keane: Pathogen Sequencing Unit, Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, CB10 1SA Hinxton, UK. tkeane@cs.nuim.ie

Abstract

With the number of fully sequenced genomes increasing steadily, there is greater interest in performing large-scale phylogenomic analyses from large numbers of individual gene families. Maximum likelihood (ML) has been shown repeatedly to be one of the most accurate methods for phylogenetic construction. Recently, there have been a number of algorithmic improvements in maximum-likelihood-based tree search methods. However, it can still take a long time to analyse the evolutionary history of many gene families using a single computer. Distributed computing refers to a method of combining the computing power of multiple computers in order to perform some larger overall calculation. In this article, we present the first high-throughput implementation of a distributed phylogenetics platform, MultiPhyl, capable of using the idle computational resources of many heterogeneous non-dedicated machines to form a phylogenetics supercomputer. MultiPhyl allows a user to upload hundreds or thousands of amino acid or nucleotide alignments simultaneously and perform computationally intensive tasks such as model selection, tree searching and bootstrapping of each of the alignments using many desktop machines. The program implements a set of 88 amino acid models and 56 nucleotide maximum likelihood models and a variety of statistical methods for choosing between alternative models. A MultiPhyl webserver is available for public use at: http://www.cs.nuim.ie/distributed/multiphyl.php.

References

  1. Mol Biol Evol. 2002 Oct;19(10):1717-26 [PMID: 12270898]
  2. Bioinformatics. 2005 Feb 15;21(4):456-63 [PMID: 15608047]
  3. Bioinformatics. 2001 Jul;17(7):662-3 [PMID: 11448888]
  4. Bioinformatics. 2002 Mar;18(3):502-4 [PMID: 11934758]
  5. Syst Biol. 2003 Oct;52(5):696-704 [PMID: 14530136]
  6. Mol Biol Evol. 2004 Aug;21(8):1565-71 [PMID: 15163768]
  7. J Theor Biol. 1990 Feb 22;142(4):485-501 [PMID: 2338834]
  8. Comput Appl Biosci. 1994 Feb;10(1):41-8 [PMID: 8193955]
  9. Bioinformatics. 2005 Apr 1;21(7):969-74 [PMID: 15513992]
  10. Bioinformatics. 2005 Apr 15;21(8):1705-6 [PMID: 15564297]
  11. Mol Biol Evol. 2005 May;22(5):1175-84 [PMID: 15703245]
  12. Nat Rev Genet. 2005 May;6(5):361-75 [PMID: 15861208]
  13. Science. 2005 May 6;308(5723):810 [PMID: 15879205]
  14. BMC Evol Biol. 2006;6:29 [PMID: 16563161]
  15. BMC Evol Biol. 2006;6:99 [PMID: 17121679]
  16. Bioinformatics. 2003 Aug 12;19(12):1572-4 [PMID: 12912839]

MeSH Term

Algorithms
Animals
Computational Biology
Computer Simulation
Computers
Computing Methodologies
Databases, Genetic
Genomics
Humans
Internet
Likelihood Functions
Phylogeny
Sequence Alignment
Software
User-Computer Interface

Word Cloud

Created with Highcharts 10.0.0usingmethodsmanycomputingMultiPhylmodelsnumbergenefamilieslikelihoodtreeperformhigh-throughputdistributedphylogeneticsmachinesaminoacidnucleotidealignmentswebserverfullysequencedgenomesincreasingsteadilygreaterinterestperforminglarge-scalephylogenomicanalyseslargenumbersindividualMaximumMLshownrepeatedlyoneaccuratephylogeneticconstructionRecentlyalgorithmicimprovementsmaximum-likelihood-basedsearchHowevercanstilltakelongtimeanalyseevolutionaryhistorysinglecomputerDistributedrefersmethodcombiningpowermultiplecomputersorderlargeroverallcalculationarticlepresentfirstimplementationplatformcapableidlecomputationalresourcesheterogeneousnon-dedicatedformsupercomputerallowsuseruploadhundredsthousandssimultaneouslycomputationallyintensivetasksmodelselectionsearchingbootstrappingdesktopprogramimplementsset8856maximumvarietystatisticalchoosingalternativeavailablepublicuseat:http://wwwcsnuimie/distributed/multiphylphpMultiPhyl:phylogenomics

Similar Articles

Cited By