Clusterflock

Introduction

Collective animal behavior, such as the flocking of birds or the shoaling of fish, has inspired a class of algorithms designed to optimize distance-based clusters in various applications, including document analysis and DNA microarrays. In a flocking model, individual agents respond only to their immediate environment and move according to a few simple rules. After several iterations the agents self-organize, and clusters emerge without the need for partitional seeds. In addition to its unsupervised nature, flocking offers several computational advantages, including the potential to reduce the number of required comparisons.In the tool presented here, Clusterflock, we have implemented a flocking algorithm designed to locate groups (flocks) of orthologous gene families (OGFs) that share an evolutionary history. Pairwise distances that measure phylogenetic incongruence between OGFs guide flock formation. We tested this approach on several simulated datasets by varying the number of underlying topologies, the proportion of missing data, and evolutionary rates, and show that in datasets containing high levels of missing data and rate heterogeneity, Clusterflock outperforms other well-established clustering techniques. We also verified its utility on a known, large-scale recombination event in Staphylococcus aureus. By isolating sets of OGFs with divergent phylogenetic signals, we were able to pinpoint the recombined region without forcing a pre-determined number of groupings or defining a pre-determined incongruence threshold.Clusterflock is an open-source tool that can be used to discover horizontally transferred genes, recombined areas of chromosomes, and the phylogenetic 'core' of a genome. Although we used it here in an evolutionary context, it is generalizable to any clustering problem. Users can write extensions to calculate any distance metric on the unit interval, and can use these distances to 'flock' any type of data.

Publications

Clusterflock: a flocking algorithm for isolating congruent phylogenomic datasets.
Cite this
Narechania A, Baker R, DeSalle R, Mathema B, Kolokotronis SO, Kreiswirth B, Planet PJ, 2016-10-01 - GigaScience

Credits

Apurva Narechania
Developer
Sackler Institute for Comparative Genomics, American Museum of Natural History, United States of America
Richard Baker
Developer
Sackler Institute for Comparative Genomics, American Museum of Natural History, United States of America
Rob DeSalle
Developer
Sackler Institute for Comparative Genomics, American Museum of Natural History, United States of America
Barun Mathema
Developer
Department of Epidemiology, Mailman School of Public Health, United States of America
Sergios-Orestis Kolokotronis
Developer
Department of Biological Sciences, Fordham University, United States of America
Barry Kreiswirth
Developer
Public Health Research Institute Center, New Jersey Medical School, United States of America
Paul J Planet
Investigator
Department of Pediatrics, Division of Pediatric Infectious Diseases

Community Ratings

Usability	Efficiency	Reliability	Rated By
			0 user
Sign in to rate

Summary

Accession	BT006870
Tool Type	Application
Category
Platforms	Linux/Unix
Technologies	Perl
User Interface	Terminal Command Line
Download Count	0
Submitted By	Paul J Planet

Clusterflock

Introduction

Publications

Clusterflock: a flocking algorithm for isolating congruent phylogenomic datasets. Cite this

Credits

Community Ratings

Clusterflock: a flocking algorithm for isolating congruent phylogenomic datasets.
Cite this