Introduction

Computational analysis of data produced in deep sequencing (DS) experiments is challenging due to large data volumes and requirements for flexible analysis approaches. Here, we present a mathematical formalism based on set algebra for frequently performed operations in DS data analysis to facilitate translation of biomedical research questions to language amenable for computational analysis. With the help of this formalism, we implemented the Genomic Region Operation Kit (GROK), which supports various DS-related operations such as preprocessing, filtering, file conversion, and sample comparison. GROK provides high-level interfaces for R, Python, Lua, and command line, as well as an extension C++ API. It supports major genomic file formats and allows storing custom genomic regions in efficient data structures such as red-black trees and SQL databases. To demonstrate the utility of GROK, we have characterized the roles of two major transcription factors (TFs) in prostate cancer using data from 10 DS experiments. GROK is freely available with a user guide from >http://csbi.ltdk.helsinki.fi/grok/.

Publications

  1. Genomic region operation kit for flexible processing of deep sequencing data.
    Cite this
    Ovaska K, Lyly L, Sahu B, Jänne OA, Hautaniemi S, 2013-01-01 - IEEE/ACM transactions on computational biology and bioinformatics

Credits

  1. Kristian Ovaska
    Developer

    Biomedicine, Biochemistry and Developmental Biology, Finland

  2. Lauri Lyly
    Developer

  3. Biswajyoti Sahu
    Developer

  4. Olli A Jänne
    Developer

  5. Sampsa Hautaniemi
    Investigator

Community Ratings

UsabilityEfficiencyReliabilityRated By
0 user
Sign in to rate
Summary
AccessionBT006425
Tool TypeApplication
Category
PlatformsLinux/Unix
TechnologiesC++, R
User InterfaceTerminal Command Line
Download Count0
Submitted BySampsa Hautaniemi