| 描述信息 |
The 5' UTR is critical for mRNA stability and translation efficiency in therapeutics. The mean ribosome load (MRL), which represents the number of ribosomes translating a given mRNA at any given time, is widely used as a quantitative measure of 5' UTR translation efficiency. This project includes 5' UTR sequences and MRL data, with MRL values derived from polysome profiling analysis. The dataset primarily consists of 5' UTR sequences, relative read counts for each ribosome bin, total reads, and MRL values. For MRL, whether for random or endogenous sequences, it is defined as the relative distribution of reads in each bin multiplied by the cumulative sum of the ribosome count for the corresponding bin. We developed UTR-Insight, a model integrating a pretrained language model with a CNN-Transformer architecture, explaining 89.1% of the mean ribosome load (MRL) variation in random 5' UTRs and 82.8% in endogenous 5' UTRs, surpassing existing models. Using UTR-Insight, we performed high-throughput in silico screening of hundreds of thousands of endogenous 5' UTRs from primates, mice, and viruses. The screened sequences increased protein expression by up to 319% compared to the human α-globin 5' UTR, and UTR-Insight-designed sequences achieved even greater expression levels than high-performing endogenous 5' UTRs. |