Evaluation of five methods for genome-wide circadian gene identification.

Gang Wu, Jiang Zhu, Jun Yu, Lan Zhou, Jianhua Z Huang, Zhang Zhang
Author Information
  1. Gang Wu: CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing, China.
  2. Jiang Zhu: Department of Pathology, Massachusetts General Hospital and Harvard Medical School, Boston, Massachusetts, USA.
  3. Jun Yu: CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing, China.
  4. Lan Zhou: Department of Statistics, Texas A&M University, College Station, Texas, USA.
  5. Jianhua Z Huang: Department of Statistics, Texas A&M University, College Station, Texas, USA.
  6. Zhang Zhang: CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing, China zhangzhang@big.ac.cn.

Abstract

Identification of circadian-regulated genes based on temporal transcriptome data is important for studying the regulation mechanism of the circadian system. However, various computational methods adopting different strategies for the identification of cycling transcripts usually yield inconsistent results even for the same dataset, making it challenging to choose the optimal method for a specific circadian study. To address this challenge, we evaluate 5 popular methods, including ARSER (ARS), COSOPT (COS), Fisher's G test (FIS), HAYSTACK (HAY), and JTK_CYCLE (JTK), based on both simulated and empirical datasets. Our results show that increasing the number of total samples (through improving sampling frequency or lengthening the sampling time window) is beneficial for computational methods to accurately identify circadian transcripts and measure circadian phase. For a given number of total samples, higher sampling frequency is more important for HAY and JTK, and the longer sampling time window is more crucial for ARS and COS, as testified on simulated and empirical datasets from which circadian signals are computationally identified. In addition, the preference of higher sampling frequency or the longer sampling time window is also obvious for JTK, ARS, and COS in estimating circadian phases of simulated periodic profiles. Our results also indicate that attention should be paid to the significance threshold that is used for each method in selecting circadian genes, especially when analyzing the same empirical dataset with 2 or more methods. To summarize, for any study involving genome-wide identification of circadian genes from transcriptome data, our evaluation results provide suggestions for the selection of an optimal method based on specific goal and experimental design.

Keywords

MeSH Term

Circadian Rhythm
Computational Biology
Gene Expression Profiling
Genome
Genome-Wide Association Study
Transcriptome