Introduction

Generating accurate transcription factor (TF) binding site motifs from data generated using the next-generation sequencing, especially ChIP-seq, is challenging. The challenge arises because a typical experiment reports a large number of sequences bound by a TF, and the length of each sequence is relatively long. Most traditional motif finders are slow in handling such enormous amount of data. To overcome this limitation, tools have been developed that compromise accuracy with speed by using heuristic discrete search strategies or limited optimization of identified seed motifs. However, such strategies may not fully use the information in input sequences to generate motifs. Such motifs often form good seeds and can be further improved with appropriate scoring functions and rapid optimization.We report a tool named discriminative motif optimizer (DiMO). DiMO takes a seed motif along with a positive and a negative database and improves the motif based on a discriminative strategy. We use area under receiver-operating characteristic curve (AUC) as a measure of discriminating power of motifs and a strategy based on perceptron training that maximizes AUC rapidly in a discriminative manner. Using DiMO, on a large test set of 87 TFs from human, drosophila and yeast, we show that it is possible to significantly improve motifs identified by nine motif finders. The motifs are generated/optimized using training sets and evaluated on test sets. The AUC is improved for almost 90% of the TFs on test sets and the magnitude of increase is up to 39%.DiMO is available at http://stormo.wustl.edu/DiMO

Publications

  1. Discriminative motif optimization based on perceptron training.
    Cite this
    Patel RY, Stormo GD, 2014-04-01 - Bioinformatics (Oxford, England)

Credits

  1. Ronak Y Patel
    Developer

    Department of Genetics, Washington University School of Medicine, United States of America

  2. Gary D Stormo
    Investigator

Community Ratings

UsabilityEfficiencyReliabilityRated By
0 user
Sign in to rate
Summary
AccessionBT006756
Tool TypeApplication
Category
PlatformsLinux/Unix
TechnologiesR
User InterfaceTerminal Command Line
Download Count0
Submitted ByGary D Stormo