Software for Flexible Integration of Annotation

Introduction

Integrating heterogeneous datasets from several sources is a common bioinformatics task that often requires implementing a complex workflow intermixing database access, data filtering, format conversions, identifier mapping, among further diverse operations. Data integration is especially important when annotating next generation sequencing data, where a multitude of diverse tools and heterogeneous databases can be used to provide a large variety of annotation for genomic locations, such a single nucleotide variants or genes. Each tool and data source is potentially useful for a given project and often more than one are used in parallel for the same purpose. However, software that always produces all available data is difficult to maintain and quickly leads to an excess of data, creating an information overload rather than the desired goal-oriented and integrated result.We present SoFIA, a framework for workflow-driven data integration with a focus on genomic annotation. SoFIA conceptualizes workflow templates as comprehensive workflows that cover as many data integration operations as possible in a given domain. However, these templates are not intended to be executed as a whole; instead, when given an integration task consisting of a set of input data and a set of desired output data, SoFIA derives a minimal workflow that completes the task. These workflows are typically fast and create exactly the information a user wants without requiring them to do any implementation work. Using a comprehensive genome annotation template, we highlight the flexibility, extensibility and power of the framework using real-life case studies.https://github.com/childsish/sofia/releases/latest under the GNU General Public Licenseliam.childs@hu-berlin.deSupplementary data are available at Bioinformatics online.

Publications

SoFIA: a data integration framework for annotating high-throughput datasets.
Cite this
Childs LH, Mamlouk S, Brandt J, Sers C, Leser U, 2016-09-01 - Bioinformatics (Oxford, England)

Credits

Liam Harold Childs
Developer
Wissenmanagement in der Bioinformatik, Humboldt-Universität zu Berlin, Germany
Soulafa Mamlouk
Developer
DKTK Deutsches Konsortium Für Translationale Krebsforschung, Partner site Charite Berlin, Germany
Jörgen Brandt
Developer
Wissenmanagement in der Bioinformatik, Humboldt-Universität zu Berlin, Germany
Christine Sers
Developer
DKTK Deutsches Konsortium Für Translationale Krebsforschung, Partner site Charite Berlin, Germany
Ulf Leser
Investigator
Wissenmanagement in der Bioinformatik, Humboldt-Universität zu Berlin, Germany

Community Ratings

Usability	Efficiency	Reliability	Rated By
			0 user
Sign in to rate

Summary

Accession	BT006359
Tool Type	Application
Category
Platforms	Linux/Unix
Technologies
User Interface	Terminal Command Line
Download Count	0
Country/Region	Germany
Submitted By	Ulf Leser

Software for Flexible Integration of Annotation

Introduction

Publications

SoFIA: a data integration framework for annotating high-throughput datasets. Cite this

Credits

Community Ratings

SoFIA: a data integration framework for annotating high-throughput datasets.
Cite this