Introduction

In gene set analysis, the researchers are interested in determining the gene sets that are significantly correlated with an outcome, e.g. disease status or treatment. With the rapid development of high throughput sequencing technologies, Ribonucleic acid sequencing (RNA-seq) has become an important alternative to traditional expression arrays in gene expression studies. Challenges exist in adopting the existent algorithms to RNA-seq data given the intrinsic difference of the technologies and data. In RNA-seq experiments, the measure of gene expression is correlated with gene length. This inherent correlation may cause bias in gene set analysis.We develop SeqGSA, a new method for gene set analysis with length bias adjustment for RNA-seq data. It extends from the R package GSA designed for microarrays. Our method compares the gene set maxmean statistic against permutations, while also taking into account of the statistics of the other gene sets. To adjust for the gene length bias, we implement a flexible weighted sampling scheme in the restandardization step of our algorithm. We show our method improves the power of identifying significant gene sets that are affected by the length bias. We also show that our method maintains the type I error comparing with another representative method for gene set enrichment test.SeqGSA is a promising tool for testing significant gene pathways with RNA-seq data while adjusting for inherent gene length effect. It enhances the power to detect gene sets affected by the bias and maintains type I error under various situations.

Publications

  1. Gene set analysis controlling for length bias in RNA-seq experiments.
    Cite this
    Ren X, Hu Q, Liu S, Wang J, Miecznikowski JC, 2017-01-01 - BioData mining

Credits

  1. Xing Ren
    Developer

    Department of Biostatistics, SUNY University at Buffalo

  2. Qiang Hu
    Developer

    Department of Biostatistics and Bioinformatics, Roswell Park Cancer Institute

  3. Song Liu
    Developer

    Department of Biostatistics and Bioinformatics, Roswell Park Cancer Institute

  4. Jianmin Wang
    Developer

    Department of Biostatistics and Bioinformatics, Roswell Park Cancer Institute

  5. Jeffrey C Miecznikowski
    Investigator

    Department of Biostatistics, SUNY University at Buffalo

Community Ratings

UsabilityEfficiencyReliabilityRated By
0 user
Sign in to rate
Summary
AccessionBT001279
Tool TypeApplication
Category
PlatformsLinux/Unix
TechnologiesR
User InterfaceTerminal Command Line
Download Count0
Submitted ByJeffrey C Miecznikowski