Dog Expression houses gene expression profiles derived entirely from RNA-Seq data analysis on tissues from Canis. Dog Expression features the integration and visualization of gene expression profiles based on curated and quality-controlled RNA-Seq data encompassing diverse tissues. The complete landscape of gene expression for tissues, breeds, and cell lines is provided, as well as the differential gene expression associated with diseases.
For RNA-Seq data, iDog has constructed a pipeline to process the data. The data analysis code for transcriptomics and differentially expressed genes in disease is available for free on GitHub (
https://github.com/Br1anChou/idog).
Raw reads were filtered using fastp (v 0.23.2) with the parameters ‘-g -q 5 -u 50 -n 5’. The filtered reads were then aligned to the genome using STAR (v 2.7.3a) with the following parameters ‘--outFilterMultimapNmax 20 --alignSJoverhangMin 8 --alignSJDBoverhangMin 1 --outFilterMismatchNmax 999 --outFilterMismatchNoverReadLmax 0.04 --alignIntronMin 20 --alignIntronMax 1000000 --alignMatesGapMax 100000 --outSAMunmapped Within --outFilterType BySJout --outSAMattributes NH HI AS NM MD --outSAMtype BAM SortedByCoordinat --quantMode TranscriptomeSAM --sjdbScore 1’. The quantification of transcripts was performed using Kallisto (v0.46.0) with the parameters ‘--fusion --plaintext’, while gene quantifications were generated using the RSEM program (v 5.32.1) with the parameters ‘--estimate-rspd --seed 12345 --forward-prob 0.5’. TPM (Transcripts per kilobase of exon model per million mapped reads) is used as a normalized value for transcript and gene abundance to eliminate the effects of varying sequencing depths and gene lengths. For samples from the same tissue across different projects, log2(TPM+1) transformation was applied, and the results were visualized using box plots.
Differential expression analysis was performed in R using the negative binomial generalized linear model provided by DESeq2 to test for differential expression of expected counts. Gene counts of less than 10 were removed, and variance-stabilizing transformations (VST) were applied to eliminate the dependence of variance on the mean. After model fitting, the coefficients and their standard errors for each sample group were estimated. The fold change (FC) was used as input for hypothesis testing, and the significance of differential expression was assessed using the Wald test. The multiple test correction method utilized was the Benjamini-Hochberg false discovery rate (FDR). Additionally, we utilized the transformed count matrix to calculate sample-to-sample distances.