CNEReg CNEReg is an evolutionary Conserved Non-coding Element interpretation method



1. Data availablility

(1) Raw data

Data have been deposited in the Genome Sequence Archive at the Beijing Institute of Genomics, Chinese Academy of Sciences / China National Center for Bioinformation (GSA: CRA005494) or National Center for Biotechnology Information (NCBI: PRJNA485657).

(2) Processed data

CNEReg model requires input as sample matched time-series RNA-seq and ATAC-seq, and RSCNEs with conservation scores from public data.
The final processed data are defined as the data on which the conclusions in the related manuscript are based.
1. Peak files with quantitative openness data with a format bed and txt files for ATAC-Seq data.
2. The normalized gene expression profile output from Stringtie for RNA-seq data.
3. The bed file of RSCNEs which consist of chromosome, start and end coordinates with the corresponding conservation score.


2. Processing data

(1) The "heatmap&PCA_of_Chromatin_accessibility.R" script was used to draw the correlation heatmap and PCA for chromatin accessibility.

The input file "openness_RUMEN.csv" includes the normalized openness matrix of each active-RSCNEs of each time point.
Rscript heatmap&PCA_of_Chromatin_accessibility.R openness_RUMEN.csv 
heatmap_of_Chromatin_accessibility.pdf PCA_of_Chromatin_accessibility.pdf


(2) The "heatmap&PCA of Gene Expression.R" script was used to draw the correlation heatmap and PCA for gene expression.

The first input file "RNA-seq_RUMEN.csv" includes the gene FPKM matrix file of rumen and esophagus during development.
The second input file "sif.xlsx" stored the batch effect info.
Two output files are in PDF format.
Rscript JMscore.R sheep832.csv JSD_sheep.csv JSD_sheep.csv JMscore_sheep.csv

(3) The "JMscore.R" script was used to calculate the JSD scores and median expression data of all the genes in 50 tissues and combined them into JMscore to measure the specificity of genes in different tissues with the subset of JMscore of 18 TTF.

The input file "sheep832.csv" includes the gene expression level of each gene in each sample from 830 sheep samples.
The first output file "JSD_sheep.csv" includes the Jensen-Shannon divergence of ench gene in each tissue.
The second output file "median_sheep.csv" includes the median expression level of ench gene in each tissue.
The third output file "JMscore_sheep.csv" includes the JMscore of ench gene in each tissue.
Rscript JMscore.R sheep832.csv JSD_sheep.csv JSD_sheep.csv JMscore_sheep.csv 


(4) The "18TTF-JMscore-cluster.R" script was used to darw the hierarchical clustering diagram.

The first input file "18TTFs-JSD.csv" includes the Jensen-Shannon divergence score of 18 TTFs in each tissue.
The second input file "18TTFs-median.csv" includes the median expression level of 18 TTFs in each tissue.
The output file "Phylogeny_of_50_tissues.pdf" is the phylogeny tree of 50 tissues clustered by 18 TTFs.


(5) The "upstream(lasso).R" script was used to generate the TFs and RSCNEs under the linear regression model.

The first input file "RNA-seq-TF.csv" includes the expression level of each TF at each time point.
The second input file "openness.csv" includes the openness of each RSCNE at each time point.
The third input file "19TTF-upbinding.csv" includes the TF binding information of each TTF which predicted by homer.
The fourth input file "19TTF-RSCNE(LASSO).csv" is the RSCNEs selected by LASSO.
The outputfile "19TTF-up-lm.csv" includes the TFs and RSCNEs for linear regression.


(6) The "upstream(lm).R" scrpit was used to generate the RSCNEs of 19TTFs for the upstream network.

The first input file "RNA-seq-TF.csv" includes the expression level of each TF at each time point.
The second input file "openness.csv" includes the openness of each RSCNE at each time point.
The third input file "19TTF-up-lm(cor0.6).csv" includes the TFs selected by cor(TF,TTF)>0.6.
The otput file "19TTF-up-RSCNE.csv" includes the RSCNEs remained in the upstream network.


(7) The "downstream.R" script was used to calculate the TTFs downstream network.

The first input file "openness.csv" includes the openness of each RSCNE at each time point.
The second input file "RNA-seq.csv" includes the expression level of each gene at each time point.
The third input file "17TTF-downstream.csv" includes the regulatory relationship of TTFs predicted by homer.
The output file "downstream_network.csv" includes the regulatory network of TTFs in the downstream.


(8) The "functional influence_upstream.R" script was used to calculate the functional influence of active-RSCNEs in the TTF upstream network.

The input file "ATAC_average.xlsx" is the accessibility level of each active-RSCNE at each time point. The input file "RNA_average.xlsx" is the gene expression level of each TF at each time point.
The input file "Binding&Correlation_up.csv" is the motif bingding strength image and spearman correlation of image and image
The input file "URSCNE.xlsx" is the active-RSCNEs in the upstream network.
The input file "type1.conservativeS.txt" is the conservation scores of type I active-RSCNES.
The input file "type2.conservativeS.txt" is the conservation scores of type II active-RSCNES.
The output file "up-FI.csv" is the functional influence of active-RSNCEs in the TTF upstream network.
Rscript functional_influence_upstream.R ATAC_average.xlsx RNA_average.xlsx Binding&Correlation_up.csv URSCNE.xlsx 
type1.conservativeS.txt type2-a.conservativeS.txt up-FI.csv


(9) The "functional influence_downstream.R" script was used to calculate the functional influence of active-RSCNEs in the TTF downstream network.

The input file "ATAC_average.xlsx" is the accessibility level of each active-RSCNE at each time point.
The input file "RNA_average.xlsx" is the gene expression level of each TF at each time point.
The input file "Binding&Correlation_up.csv" is the motif bingding strength image and spearman correlation of image and image
The input file "DRSCNE.xlsx" is the active-RSCNEs in the downstream network.
The input file "type1.conservativeS.txt" is the conservation scores of type I active-RSCNES.
The input file "type2.conservativeS.txt" is the conservation scores of type II active-RSCNES.
The output file "down-FI.csv" is the functional influence of active-RSNCEs in the TTF downstream network.
Rscript functional_influence_downstream.R ATAC_average.xlsx RNA_average.xlsx Binding&Correlation_up.csv DRSCNE.xlsx 
type1.conservativeS.txt type2-a.conservativeS.txt down-FI.csv 


(10) The "functional influence_diffNetwork.R" script was used to calculate the functional influence of active-RSCNEs in the differential subnetwork between rumen and esophagus.

The input file "diff-single-system.txt" is the regulatory strength of each TF-RSCNE-TG pair at each time point.
The input file "DiffRSCNE.xlsx" stored active-RSCNEs in the differential subnetwork between rumen and esophagus.
The input file "type1.conservativeS.txt" is the conservation scores of type-I active-RSCNEs.
The input file "type2-a.conservativeS.txt" is the conservation scores of type-II active-RSCNEs.
The output file "diff-FI.csv" is the functional influence of active-RSCNEs in the differential subnetwork between rumen and esophagus.
Rscript functional influence_diffNetwork.R diff-single-system.txt DiffRSCNE.xlsx type1.conservativeS.txt type2-a.conservativeS.txt 



Xiangyu Pan, Zhaoxia Ma, Xinqi Sun, Hui Li, Tingting Zhang, Zhao Chen, Nini Wang, Wing Hung Wong, Wen Wang, Yu Jiang, Yong Wang. Interpreting ruminant specific conserved non-coding elements by reconstructing developmental regulatory network. (In submission).
  • We'd love to hear from you. If you have any questions, please don't be hestitate to contact the author of this manuscript: