| 描述信息 |
Single-molecule Real-time Isoform Sequencing (Iso-sSeq) of transcriptomes by PacBio can generate very long and accurate reads, thus providing an ideal platform for full-length transcriptome analysis. This sequencing technological breakthrough requires developing novel computational tools that can fully utilizing the benefits provided by Iso-seq. Here, we present an integrated computational toolkit named TAGET for Iso-seq full-length transcript data analyses, including transcript alignment, annotation, gene fusion detection, as well as transcript and quantification analyses such as differential expression gene analysis and differential isoform usage (DIU) analysis. We evaluated the performance of TAGET using a public Iso-seq dataset and four pairs of newly sequenced Iso-seq datasets from tumor and matched normal tissues. We found that TAGET achieved superior or similar performances in comparison with available methods. Especially, TAGET gave significantly more precise novel splicing splice site prediction and thus enabled more accurate novel transcript isoform and gene fusion discoveries, which were validated by our experiments. Experimental validation demonstrated the high precision of TAGET for identifying novel transcripts and gene fusions. In the paired laryngocarcinoma samples, we identified and experimentally validated a DIU gene ECM1. ECM1 was shown to be an oncogene, but its isoform ECM1b might be a tumor-suppressor in laryngocarcinoma. Finally, we evaluated the performance of TAGET on Oxford Nanopore Technologies (ONT) datasets, elucidating its broad applicability. Our results demonstrate that TAGET provides a valuable computational toolkit and can be applied to many full-length transcriptome studies. |