Zhang Zhang


Email: zhangzhang (AT) big.ac.cn

Tel: +86 (10) 8409-7261

Chinese Name: 章张  [Google Scholar, h-index=51, as of Jan. 2024; Academic Genealogy; ORCID: 0000-0001-6603-5060


Dr. Zhang Zhang is a Distinguished Professor of Beijing Institute of Genomics (BIG), Chinese Academy of Sciences (CAS) & China National Center for Bioinformation (CNCB), and acts as Associate Director of the National Genomics Data Center (NGDC), which is part of BIG & CNCB. He obtained his PhD degree in Computer Science from Institute of Computing Technology, CAS in 2007, with a bioinformatics joint program with BIG, under the supervision of Jun Yu. Prior to joining BIG, he worked as Postdoctoral Associate at Yale University (advised by Jeffrey P. Townsend) in United States from 2007~2009 and Research Scientist at King Abdullah University of Science and Technology (advised by Vladimir B. Bajic) in Saudi Arabia from 2009~2011.

In 2011, Dr. Zhang was appointed as Professor in the CAS 100-Talent Program by BIG. Considering the critical importance of multi-omics data, he focuses on biological data integration & curation and development of multi-omics data resources and new algorithms & tools. In 2016, Dr. Zhang co-founded the BIG Data Center and served as Executive Director responsible for the center development. In this role, he led the center team to construct a family of database resources and computational methods and to provide a range of data services in support of worldwide research activities. Importantly, he led the center development by setting the organizational structure and directions, establishing the scientific advisory board composed with global field experts, and building interactions and collaborations with worldwide institutions to maximize the scope of data sharing. Based on the BIG Data Center, NGDC and CNCB were officially founded in 2019. Over the past several years, Dr. Zhang developed more than forty database resources and computational tools, organized a series of scientific conferences for promotion of computational biology & bioinformatics domestically and internationally, and raised the general awareness of significant value of database resources as a fundamental infrastructure for biomedical research.

Dr. Zhang published more than 130 papers and his research achievements have been selected in the "Top 10 Advances in Bioinformatics in China". He was an Executive Committee member of the International Society for Biocuration (2015-2018) and now serves as Asian Regional Editor for Briefings in Bioinformatics (2017-) and Associate Editor-in-Chief for Genomics Proteomics & Bioinformatics (2012-). Towards a new paradigm from data to theory, he recently works on theoretical biology with new algorithms, models and laws for deciphering basic principles of life.


  • Associate Director of National Genomics Data CenterBeijing Institute of Genomics, Chinese Academy of Sciences / China National Center for Bioinformation, China, 2020 - Present

  • Executive Director of BIG Data CenterBeijing Institute of Genomics, Chinese Academy of Sciences, China, 2016 - 2020

  • Professor, University of Chinese Academy of Sciences (UCAS), China, 2016 - Present

  • Professor in the CAS 100-Talent Program, Beijing Institute of Genomics, Chinese Academy of Sciences, China, 2011 - Present

  • Research Scientist, King Abdullah University of Science and Technology, Kingdom of Saudi Arabia, 2009 - 2011

  • Postdoctoral Associate, Yale University, United States of America, 2007 - 2009


  • PhD in Computer Science (a bioinformatics joint program with BIG), Institute of Computing Technology, Chinese Academy of Sciences, China, 2007

  • MS in Computer Science, Nanjing University of Science and Technology, China, 2004

  • BS in Computer Science, Ningxia University, China, 2002


  • 2023年度“全球前2%顶尖科学家”(World Top 2% Scientists 2023)

  • 2023年度中国科学院大学校级“本科生优秀课程” - 基因组学(2023 UCAS University-Level Excellent Undergraduate Course - Genomics)

  • 2022年中国生物信息学十大进展(Top 10 Advances in Bioinformatics in China 2022)

  • 2022年度“全球前2%顶尖科学家”(World Top 2% Scientists 2022)

  • 2022年度中国科学院大学院级“本科生优秀课程” - 基因组学(2022 UCAS College-Level Excellent Undergraduate Course - Genomics)

  • 2022年中国科学院特聘研究员(Distinguished Professor of Chinese Academy of Sciences, August 2022)

  • 2021年中国生物信息学十大进展(Top 10 Advances in Bioinformatics in China 2021)

  • 2021年度“全球前2%顶尖科学家”(World's Top 2% Scientists 2021)

  • 2021年中国科学院大学-BHPB导师科研奖(UCAS-BHPB Excellent Supervisor Award, November 2021)

  • 2020年中国生物信息学十大进展(Top 10 Advances in Bioinformatics in China 2020)

  • 2020年中国科学院大学领雁金奖引航奖(UCAS Lingyan Golden Award, September 2020

  • 2019年中国生物信息学十大数据库(Top 10 Databases in Bioinformatics in China 2019)

  • 2019年国家“万人计划”青年拔尖人才(National Ten Thousand Talent Program for Young Top-notch Talent, February 2019

  • 2018年中国生物信息学十大进展(Top 10 Advances in Bioinformatics in China 2018)

  • 2018年“长江学者奖励计划”青年学者(Chang Jiang Young Scholar, 2018)

  • 2018年中国科学院大学-BHPB导师科研奖(UCAS-BHPB Excellent Supervisor Award, September 2018)

  • 2017年中国科学院北京基因组研究所特聘研究员(Distinguished Professor of Beijing Institute of Genomics, September 2017)

  • 2017年中国科学院“百人计划”终期评估优秀奖(Excellence Award in the Final Evaluation of the CAS 100-Talent Program, August 2017)


  • 2023年中国科学院朱李月华奖学金 - 李昭 (Zhao Li, 2023 CAS Pollyanna Chu Scholarship)

  • 2022年博士研究生国家奖学金 - 刘晓楠 (Xiaonan Liu, 2022 National Scholarship for Doctoral Students)

  • 2022年度中国科学院院长优秀奖 - 滕徐菲 (Xufei Teng, 2022 CAS President Execellent Award)

  • 2021年中国科学院-必和必拓BHPB奖 - 滕徐菲 (Xufei Teng, 2021 UCAS-BHPB Award)

  • 2021年博士研究生国家奖学金 - 李昭 (Zhao Li, 2021 National Scholarship for Doctoral Students)

  • 2021年硕士研究生国家奖学金 - 朱彤彤 (Tongtong Zhu, 2021 National Scholarship for Master Students)

  • 2021年北京市普通高等学校优秀毕业生 - 刘琳 (Lin Liu, 2021 Execellent Graduate of Higher Education Universities in Beijing)

  • 2020年博士研究生国家奖学金 - 刘琳 (Lin Liu, 2020 National Scholarship for Doctoral Students)

  • 2020年硕士研究生国家奖学金 - 张源笙 (Yuansheng Zhang, 2020 National Scholarship for Master Students)

  • 2019年北京市普通高等学校优秀毕业生 - 王佩 (Pei Wang, 2019 Execellent Graduate of Higher Education Universities in Beijing)

  • 2019年硕士研究生国家奖学金 - 王佩 (Pei Wang, 2019 National Scholarship for Master Students)

  • 2019年北京市普通高等学校优秀毕业生 - 桑健 (Jian Sang, 2019 Execellent Graduates of Higher Education Universities in Beijing)

  • 2018年博士研究生国家奖学金 - 桑健 (Jian Sang, 2018 National Scholarship for Doctoral Students)

  • 2018年硕士研究生国家奖学金 - 曹佳宝 (Jiabao Cao, 2018 National Scholarship for Master Students)

  • 2018年中国科学院-必和必拓BHPB奖 - 桑健 (Jian Sang, 2018 UCAS-BHPB Award)



  • Big Data Integration and Curation: construction of multi-omics databases and knowledgebases by big data integration and curation and development of new theory for biological big data commons and ecosystem, with particular focuses on public health and national strategic important species. 生物大数据整合审编与数据库系统研发:面向我国人口健康和重要战略生物资源,建立海量多组学数据资源体系,研发多层次组学数据库、信息库与知识库系统,发展生物大数据公共资源生态系统理论。

  • Computational Molecular Evolution: establishment of molecular evolutionary models and theories at the nucleotide and codon levels and development of new methods and tools for detecting national selection pressure. 计算分子进化与自然选择压力检测:发展分子序列在核苷酸和密码子水平上的演化新模型和新理论,研发自然选择压力检测新方法与新技术。

  • Computational Health Genomics: development of new methods and algorithms by associating omics data with health data and conducting integrative data analysis and deep mining via artificial intelligence and machine learning, with the aim to provide more effective ways for precision health and medical treatment for brain tumors, like glioma. 健康基因组数据智能深度挖掘:围绕人类脑胶质瘤等重大疾病,基于人工智能、机器学习、统计学习等多学科前沿交叉技术,研发面向精准医学的肿瘤多组学数据与健康数据的整合分析和深度挖掘的新方法与新技术。


  • Introduction to Omics (for graduates; in English)

  • Bioinformatics, Genomics, Big Data (for graduates; in Chinese)

  • Genomics (for undergraduates; in Chinese)




  1. Zhang Z Laws of genome nucleotide composition. biorxiv, doi:10.1101/2023.09.09.557014. https://ngdc.cncb.ac.cn/openlb/publication/OLB-BRV-10.1101/2023.09.09.557014v2

  2. Gao X, Chen K, Xiong J, Zou D, Yang F, Ma Y, Jiang C, Gao X, Wang G, Gu S, Zhang P, Luo S, Huang K, Bao Y, Zhang Z, Ma L and Miao W (2024) The P10K database: a data portal for the protist 10 000 genomes project. Nucleic Acids Res, 52, D747-D755.

  3. CNCB-NGDC Members & Partners (2024) Database Resources of the National Genomics Data Center, China National Center for Bioinformation in 2024. Nucleic Acids Res, 52, D18-D32.

  4. Cao Y, Tian D, Tang Z, Liu X, Hu W, Zhang Z and Song S (2024) OPIA: an open archive of plant images and related phenotypic traits. Nucleic Acids Res, 52, D1530-D1537.

  5. Zhu T, Niu G, Zhang Y, Chen M, Li C Y, Hao L and Zhang Z (2023) Host-mediated RNA editing in viruses. Biol Direct, 18, 12.

  6. Zhang M, Zong W, Zou D, Wang G, Zhao W, Yang F, Wu S, Zhang X, Guo X, Ma Y, Xiong Z, Zhang Z, Bao Y and Li R (2023) MethBank 4.0: an updated database of DNA methylation across a variety of species. Nucleic Acids Res, 51, D208-D216.

  7. Wang Z, Wang Y W, Kasuga T, Lopez-Giraldez F, Zhang Y, Zhang Z, Wang Y, Dong C, Sil A, Trail F, Yarden O and Townsend J P (2023) Lineage-specific genes are clustered with HET-domain genes and respond to environmental and genetic manipulations regulating reproduction in Neurospora. PLoS Genet, 19, e1011019.

  8. Pan S, Kang H, Liu X, Lin S, Yuan N, Zhang Z, Bao Y and Jia P (2023) Brain Catalog: a comprehensive resource for the genetic landscape of brain-related traits. Nucleic Acids Res, 51, D835-D844.

  9. Ma L and Zhang Z (2023) The contribution of databases towards understanding the universe of long non-coding RNAs. Nat Rev Mol Cell Biol, 24, 601-602.

  10. Liu Y, Zhang Y, Liu X, Shen Y, Tian D, Yang X, Liu S, Ni L, Zhang Z, Song S and Tian Z (2023) SoyOmics: A deeply integrated database on soybean multi-omics. Mol Plant, 16, 794-797.

  11. Liu X, Tian D, Li C, Tang B, Wang Z, Zhang R, Pan Y, Wang Y, Zou D, Zhang Z and Song S (2023) GWAS Atlas: an updated knowledgebase integrating more curated associations in plants and animals. Nucleic Acids Res, 51, D969-D976.

  12. Li Z, Liu L, Feng C, Qin Y, Xiao J, Zhang Z and Ma L (2023) LncBook 2.0: integrating human long non-coding RNAs with multi-omics annotations. Nucleic Acids Res, 51, D186-D191.

  13. Li L, Xu B, Tian D, Wang A, Zhu J, Li C, Li N, Zhao W, Shi L, Xue Y, Zhang Z, Bao Y, Zhao W and Song S (2023) McAN: a novel computational algorithm and platform for constructing and visualizing haplotype networks. Brief Bioinform, 24.

  14. Jiang S, Qian Q, Zhu T, Zong W, Shang Y, Jin T, Zhang Y, Chen M, Wu Z, Chu Y, Zhang R, Luo S, Jing W, Zou D, Bao Y, Xiao J and Zhang Z (2023) Cell Taxonomy: a curated repository of cell types with multifaceted characterization. Nucleic Acids Res, 51, D853-D860.

  15. Hua Z, Jiang C, Song S, Tian D, Chen Z, Jin Y, Zhao Y, Zhou J, Zhang Z, Huang L and Yuan Y (2023) Accurate identification of taxon-specific molecular markers in plants based on DNA signature sequence. Mol Ecol Resour, 23, 106-117.

  16. Duan G, Wu G, Chen X, Tian D, Li Z, Sun Y, Du Z, Hao L, Song S, Gao Y, Xiao J, Zhang Z, Bao Y, Tang B and Zhao W (2023) HGD: an integrated homologous gene database across multiple species. Nucleic Acids Res, 51, D994-D1002.

  17. CNCB-NGDC Members & Partners (2023) Database Resources of the National Genomics Data Center, China National Center for Bioinformation in 2023. Nucleic Acids Res, 51, D18-D28.

  18. Zhang Z W, Teng X, Zhao F, Ma C, Zhang J, Xiao L F, Wang Y, Chang M, Tian Y, Li C, Zhang Z, Song S, Tong W M, Liu P and Niu Y (2022) METTL3 regulates m(6)A methylation of PTCH1 and GLI2 in Sonic hedgehog signaling to promote tumor progression in SHH-medulloblastoma. Cell Rep, 41, 111530.

  19. Zhang Z (2022) KaKs_Calculator 3.0: Calculating Selective Pressure on Coding and Non-coding Sequences. Genomics Proteomics Bioinformatics, 20, 536-540.

  20. Zhang Y, Zou D, Zhu T, Xu T, Chen M, Niu G, Zong W, Pan R, Jing W, Sang J, Liu C, Xiong Y, Sun Y, Zhai S, Chen H, Zhao W, Xiao J, Bao Y, Hao L and Zhang Z (2022) Gene Expression Nebulas (GEN): a comprehensive data portal integrating transcriptomic profiles across multiple species at both bulk and single-cell levels. Nucleic Acids Res, 50, D1016-D1024.

  21. Xiong Z, Yang F, Li M, Ma Y, Zhao W, Wang G, Li Z, Zheng X, Zou D, Zong W, Kang H, Jia Y, Li R, Zhang Z and Bao Y (2022) EWAS Open Platform: integrated data, knowledge and toolkit for epigenome-wide association study. Nucleic Acids Res, 50, D1004-D1009.

  22. Ma L, Zou D, Liu L, Shireen H, Abbasi A A, Bateman A, Xiao J, Zhao W, Bao Y and Zhang Z (2022) Database Commons: A Catalog of Worldwide Biological Databases. Genomics Proteomics Bioinformatics, 10.1016/j.gpb.2022.12.004.

  23. Liu L, Zhang Y, Niu G, Li Q, Li Z, Zhu T, Feng C, Liu X, Zhang Y, Xu T, Chen R, Teng X, Zhang R, Zou D, Ma L and Zhang Z (2022) BrainBase: a curated knowledgebase for brain diseases. Nucleic Acids Res, 50, D1131-D1138.

  24. Liu L, Li Z, Liu C, Zou D, Li Q, Feng C, Jing W, Luo S, Zhang Z and Ma L (2022) LncRNAWiki 2.0: a knowledgebase of human long non-coding RNAs with enhanced curation model and database system. Nucleic Acids Res, 50, D190-D195.

  25. Li R, Zhang X, Song S, Wang Y, Zou D, Xiao J, Zhao W, Zhang Z and Bao Y (2022) Safety management and application of genomics data. Big Data Research, 8, 37-45.

  26. Jiang S, Du Q, Feng C, Ma L and Zhang Z (2022) CompoDynamics: a comprehensive database for characterizing sequence composition dynamics. Nucleic Acids Res, 50, D962-D969.

  27. Jia L, Li Y, Huang F, Jiang Y, Li H, Wang Z, Chen T, Li J, Zhang Z and Yao W (2022) LIRBase: a comprehensive database of long inverted repeats in eukaryotic genomes. Nucleic Acids Res, 50, D174-D182.

  28. Hua Z, Tian D, Jiang C, Song S, Chen Z, Zhao Y, Jin Y, Huang L, Zhang Z and Yuan Y (2022) Towards comprehensive integration and curation of chloroplast genomes. Plant Biotechnol J, 20, 2239-2241.

  29. CNCB-NGDC Members & Partners (2022) Database Resources of the National Genomics Data Center, China National Center for Bioinformation in 2022. Nucleic Acids Res, 50, D27-D38.

  30. Cao J, Zhang Y, Tan S, Yang Q, Wang H-L, Xia X, Luo J, Guo H, Zhang Z and Li Z (2022) LSD 4.0: an improved database for comparative studies of leaf senescence. Molecular Horticulture, 2, 24.

  31. Zhang S S, Chen X, Chen T T, Zhu J W, Tang B X, Wang A K, Dong L L, Zhang Z W, Sun Y L, Yu C X, Zhai S, Sun Y B, Chen H X, Du Z L, Xiao J F, Zhang Z, Bao Y M, Wang Y Q and Zhao W M (2021) GSA-Human: Genome Sequence Archive for Human. Yi Chuan, 43, 988-993.

  32. Sun C, Huang J, Wang Y, Zhao X, Su L, Thomas G W C, Zhao M, Zhang X, Jungreis I, Kellis M, Vicario S, Sharakhov I V, Bondarenko S M, Hasselmann M, Kim C N, Paten B, Penso-Dolfin L, Wang L, Chang Y, Gao Q, Ma L, Ma L, Zhang Z, Zhang H, Zhang H, Ruzzante L, Robertson H M, Zhu Y, Liu Y, Yang H, Ding L, Wang Q, Ma D, Xu W, Liang C, Itgen M W, Mee L, Cao G, Zhang Z, Sadd B M, Hahn M W, Schaack S, Barribeau S M, Williams P H, Waterhouse R M and Mueller R L (2021) Genus-Wide Characterization of Bumblebee Genomes Provides Insights into Their Evolution and Variation in Ecological and Behavioral Traits. Mol Biol Evol, 38, 486-501.

  33. RNAcentral Consortium (2021) RNAcentral 2021: secondary structure integration, improved sequence search and new member databases. Nucleic Acids Res, 49, D212-D220.

  34. Liu X, Wang P, Teng X, Zhang Z and Song S (2021) Comprehensive Analysis of Expression Regulation for RNA m6A Regulators With Clinical Significance in Human Cancers. Front Oncol, 11, 624395.

  35. Li Z, Liu L, Jiang S, Li Q, Feng C, Du Q, Zou D, Xiao J, Zhang Z and Ma L (2021) LncExpDB: an expression database of human long non-coding RNAs. Nucleic Acids Res, 49, D962-D968.

  36. Li C, Tian D, Tang B, Liu X, Teng X, Zhao W, Zhang Z and Song S (2021) Genome Variation Map: a worldwide collection of genome variations across multiple species. Nucleic Acids Res, 49, D1186-D1191.

  37. CNCB-NGDC Members & Partners (2021) Database Resources of the National Genomics Data Center, China National Center for Bioinformation in 2021. Nucleic Acids Res, 49, D18-D28.

  38. Chen T, Chen X, Zhang S, Zhu J, Tang B, Wang A, Dong L, Zhang Z, Yu C, Sun Y, Chi L, Chen H, Zhai S, Sun Y, Lan L, Zhang X, Xiao J, Bao Y, Wang Y, Zhang Z and Zhao W (2021) The Genome Sequence Archive Family: Toward Explosive Data Growth and Diverse Data Types. Genomics Proteomics Bioinformatics, 19, 578-583.

  39. Chen M, Ma Y, Wu S, Zheng X, Kang H, Sang J, Xu X, Hao L, Li Z, Gong Z, Xiao J, Zhang Z, Zhao W and Bao Y (2021) Genome Warehouse: A Public Repository Housing Genome-scale Data. Genomics Proteomics Bioinformatics, 19, 584-589.

  40. Zhao W M, Song S H, Chen M L, Zou D, Ma L N, Ma Y K, Li R J, Hao L L, Li C P, Tian D M, Tang B X, Wang Y Q, Zhu J W, Chen H X, Zhang Z, Xue Y B and Bao Y M (2020) The 2019 novel coronavirus resource. Yi Chuan, 42, 212-221.

  41. Zhang Z, Song S, Yu J, Zhao W, Xiao J and Bao Y (2020) The Elements of Data Sharing. Genomics Proteomics Bioinformatics, 18, 1-4.

  42. Yan J, Zou D, Li C, Zhang Z, Song S and Wang X (2020) SR4R: An Integrative SNP Resource for Genomic Breeding and Population Research in Rice. Genomics Proteomics Bioinformatics, 18, 173-185.

  43. Xiong Z, Li M, Yang F, Ma Y, Sang J, Li R, Li Z, Zhang Z and Bao Y (2020) EWAS Data Hub: a resource of DNA methylation array data and metadata. Nucleic Acids Res, 48, D890-D895.

  44. Tian D, Wang P, Tang B, Teng X, Li C, Liu X, Zou D, Song S and Zhang Z (2020) GWAS Atlas: a curated resource of genome-wide variant-trait associations in plants and animals. Nucleic Acids Res, 48, D927-D932.

  45. Teng X, Li Q, Li Z, Zhang Y, Niu G, Xiao J, Yu J, Zhang Z and Song S (2020) Compositional Variability and Mutation Spectra of Monophyletic SARS-CoV-2 Clades. Genomics Proteomics Bioinformatics, 18, 648-663.

  46. Song S, Ma L, Zou D, Tian D, Li C, Zhu J, Chen M, Wang A, Ma Y, Li M, Teng X, Cui Y, Duan G, Zhang M, Jin T, Shi C, Du Z, Zhang Y, Liu C, Li R, Zeng J, Hao L, Jiang S, Chen H, Han D, Xiao J, Zhang Z, Zhao W, Xue Y and Bao Y (2020) The Global Landscape of SARS-CoV-2 Genomes, Variants, and Haplotypes in 2019nCoVR. Genomics Proteomics Bioinformatics, 18, 749-759.

  47. Sang J, Zou D, Wang Z, Wang F, Zhang Y, Xia L, Li Z, Ma L, Li M, Xu B, Liu X, Wu S, Liu L, Niu G, Li M, Luo Y, Hu S, Hao L and Zhang Z (2020) IC4R-2.0: Rice Genome Reannotation Using Massive RNA-seq Data. Genomics Proteomics Bioinformatics, 18, 161-172.

  48. National Genomics Data Center Members & Partners (2020) Database Resources of the National Genomics Data Center in 2020. Nucleic Acids Res, 48, D24-D33.

  49. Miao W, Song L, Ba S, Zhang L, Guan G, Zhang Z and Ning K (2020) Protist 10,000 Genomes Project. Innovation (Camb), 1, 100058.

  50. Liu L, Wang G, Wang L, Yu C, Li M, Song S, Hao L, Ma L and Zhang Z (2020) Computational identification and characterization of glioma candidate biomarkers through multi-omics integrative profiling. Biol Direct, 15, 10.

  51. Li Z, Zhang Y, Zou D, Zhao Y, Wang H L, Zhang Y, Xia X, Luo J, Guo H and Zhang Z (2020) LSD 3.0: a comprehensive resource for the leaf senescence research community. Nucleic Acids Res, 48, D1069-D1075.

  52. Li Q, Li Z, Feng C, Jiang S, Zhang Z and Ma L (2020) Multi-omics annotation of human long non-coding RNAs. Biochem Soc Trans, 48, 1545-1556.

  53. Gong Z, Zhu J W, Li C P, Jiang S, Ma L N, Tang B X, Zou D, Chen M L, Sun Y B, Song S H, Zhang Z, Xiao J F, Xue Y B, Bao Y M, Du Z L and Zhao W M (2020) An online coronavirus analysis platform from the National Genomics Data Center. Zool Res, 41, 705-708.

  54. Zhao Y, Wang J, Liang F, Liu Y, Wang Q, Zhang H, Jiang M, Zhang Z, Zhao W, Bao Y, Zhang Z, Wu J, Asmann Y W, Li R and Xiao J (2019) NucMap: a database of genome-wide nucleosome positioning map across species. Nucleic Acids Res, 47, D163-D169.

  55. Zhang Z, Yu J, Eisenhaber F, Gao X and Gojobori T (2019) In Memory of Vladimir B. Bajic (1952–2019). Genomics, Proteomics & Bioinformatics, 17, 473-474.

  56. Yin H, Li M, Xia L, He C and Zhang Z (2019) Computational determination of gene age and characterization of evolutionary dynamics in human. Brief Bioinform, 20, 2141-2149.

  57. Wang G, Yin H, Li B, Yu C, Wang F, Xu X, Cao J, Bao Y, Wang L, Abbasi A A, Bajic V B, Ma L and Zhang Z (2019) Characterization and identification of long non-coding RNAs based on feature relationship. Bioinformatics, 35, 2949-2956.

  58. The RNAcentral Consortium (2019) RNAcentral: a hub of information for non-coding RNA sequences. Nucleic Acids Res, 47, D221-D229.

  59. Tang B, Zhou Q, Dong L, Li W, Zhang X, Lan L, Zhai S, Xiao J, Zhang Z, Bao Y, Zhang Y P, Wang G D and Zhao W (2019) iDog: an integrated resource for domestic dogs and wild canids. Nucleic Acids Res, 47, D793-D800.

  60. Song S and Zhang Z (2019) Database Resources in BIG Data Center: Submission, Archiving, and Integration of Big Data in Plant Science. Mol Plant, 12, 279-281.

  61. Pervaiz N, Shakeel N, Qasim A, Zehra R, Anwar S, Rana N, Xue Y, Zhang Z, Bao Y and Abbasi A A (2019) Evolutionary history of the human multigene families reveals widespread gene duplications throughout the history of animals. BMC Evol Biol, 19, 128.

  62. Niu G, Zou D, Li M, Zhang Y, Sang J, Xia L, Li M, Liu L, Cao J, Zhang Y, Wang P, Hu S, Hao L and Zhang Z (2019) Editome Disease Knowledgebase (EDK): a curated knowledgebase of editome-disease associations in human. Nucleic Acids Res, 47, D78-D83.

  63. Ma L, Cao J, Liu L, Li Z, Shireen H, Pervaiz N, Batool F, Raza R Z, Zou D, Bao Y, Abbasi A A and Zhang Z (2019) Community Curation and Expert Curation of Human Long Noncoding RNAs with LncRNAWiki and LncBook. Curr Protoc Bioinformatics, 67, e82.

  64. Ma L, Cao J, Liu L, Du Q, Li Z, Zou D, Bajic V B and Zhang Z (2019) LncBook: a curated knowledgebase of human long non-coding RNAs. Nucleic Acids Res, 47, D128-D134.

  65. Li M, Zou D, Li Z, Gao R, Sang J, Zhang Y, Li R, Xia L, Zhang T, Niu G, Bao Y and Zhang Z (2019) EWAS Atlas: a curated knowledgebase of epigenome-wide association studies. Nucleic Acids Res, 47, D983-D988.

  66. Li M, Xia L, Zhang Y, Niu G, Li M, Wang P, Zhang Y, Sang J, Zou D, Hu S, Hao L and Zhang Z (2019) Plant editosome database: a curated database of RNA editosome in plants. Nucleic Acids Res, 47, D170-D174.

  67. BIG Data Center Members (2019) Database Resources of the BIG Data Center in 2019. Nucleic Acids Res, 47, D8-D14.

  68. Zhang Z, Xue Y and Zhao F (2018) Bioinformatics Commons: The Cornerstone of Life and Health Sciences. Genomics Proteomics Bioinformatics, 16, 223-225.

  69. Zhang Y S, Xia L, Sang J, Li M, Liu L, Li M W, Niu G Y, Cao J B, Teng X F, Zhou Q and Zhang Z (2018) [The BIG Data Center's database resources]. Yi Chuan, 40, 1039-1043.

  70. Song S, Tian D, Zhang Z, Hu S and Yu J (2018) Rice Genomics: over the Past Two Decades and into the Future. Genomics Proteomics Bioinformatics, 16, 397-404.

  71. Song S, Tian D, Li C, Tang B, Dong L, Xiao J, Bao Y, Zhao W, He H and Zhang Z (2018) Genome Variation Map: a data repository of genome variations in BIG Data Center. Nucleic Acids Res, 46, D944-D949.

  72. Sang J, Wang Z, Li M, Cao J, Niu G, Xia L, Zou D, Wang F, Xu X, Han X, Fan J, Yang Y, Zuo W, Zhang Y, Zhao W, Bao Y, Xiao J, Hu S, Hao L and Zhang Z (2018) ICG: a wiki-driven knowledgebase of internal control genes for RT-qPCR normalization. Nucleic Acids Res, 46, D121-D126.

  73. Li R, Liang F, Li M, Zou D, Sun S, Zhao Y, Zhao W, Bao Y, Xiao J and Zhang Z (2018) MethBank 3.0: a database of DNA methylomes across a variety of species. Nucleic Acids Res, 46, D288-D295.

  74. International Society for Biocuration (2018) Biocuration: Distilling data into knowledge. PLoS Biol, 16, e2002846.

  75. BIG Data Center Members (2018) Database Resources of the BIG Data Center in 2018. Nucleic Acids Res, 46, D14-D20.

  76. Zhao Z M, Campbell M C, Li N, Lee D S W, Zhang Z and Townsend J P (2017) Detection of Regional Variation in Selection Intensity within Protein-Coding Genes Using DNA Sequence Polymorphism and Divergence. Mol Biol Evol, 34, 3006-3022.

  77. Xu X, Ji Z and Zhang Z (2017) CloudPhylo: a fast and scalable tool for phylogeny reconstruction. Bioinformatics, 33, 438-440.

  78. Xia L, Zou D, Sang J, Xu X, Yin H, Li M, Wu S, Hu S, Hao L and Zhang Z (2017) Rice Expression Database (RED): An integrated RNA-Seq-derived gene expression database for rice. J Genet Genomics, 44, 235-241.

  79. Wang Y, Song F, Zhu J, Zhang S, Yang Y, Chen T, Tang B, Dong L, Ding N, Zhang Q, Bai Z, Dong X, Chen H, Sun M, Zhai S, Sun Y, Yu L, Lan L, Xiao J, Fang X, Lei H, Zhang Z and Zhao W (2017) GSA: Genome Sequence Archive<sup/>. Genomics Proteomics Bioinformatics, 15, 14-18.

  80. The RNAcentral Consortium (2017) RNAcentral: a comprehensive database of non-coding RNA sequences. Nucleic Acids Res, 45, D128-D134.

  81. Salhi A, Essack M, Alam T, Bajic V P, Ma L, Radovanovic A, Marchand B, Schmeier S, Zhang Z and Bajic V B (2017) DES-ncRNA: A knowledgebase for exploring information about human micro and long noncoding RNAs based on literature-mining. RNA Biol, 14, 963-971.

  82. BIG Data Center Members (2017) The BIG Data Center: from deposition to integration to translation. Nucleic Acids Res, 45, D18-D24.

  83. Zhao W, Zhang S, Tang B, Chen T, Hao L, Sang J, Li R, Xiao J and Zhang Z (2016) Constructing the international database management system for omics big data. Big Data Research, 2, 43-52.

  84. Yin H, Wang G, Ma L, Yi S V and Zhang Z (2016) What Signatures Dominantly Associate with Gene Age? Genome Biol Evol, 8, 3083-3089.

  85. Yin H, Ma L, Wang G, Li M and Zhang Z (2016) Old genes experience stronger translational selection than young genes. Gene, 590, 29-34.

  86. Xue Y, Lameijer E W, Ye K, Zhang K, Chang S, Wang X, Wu J, Gao G, Zhao F, Li J, Han C, Xu S, Xiao J, Yang X, Ying X, Zhang X, Chen W H, Liu Y, Zhang Z, Huang K and Yu J (2016) Precision Medicine: What Challenges Are We Facing? Genomics Proteomics Bioinformatics, 14, 253-261.

  87. Wang G, Sun S and Zhang Z (2016) Randomness in Sequence Evolution Increases over Time. PLoS One, 11, e0155935.

  88. Tian X, Zhang Z, Yang T, Chen M, Li J, Chen F, Yang J, Li W, Zhang B, Zhang Z, Wu J, Zhang C, Long L and Xiao J (2016) Comparative Genomics Analysis of Streptomyces Species Reveals Their Adaptation to the Marine Environment and Their Diversity at the Genomic Level. Front Microbiol, 7, 998.

  89. Sun S, Xiao J, Zhang H and Zhang Z (2016) Pangenome Evidence for Higher Codon Usage Bias and Stronger Translational Selection in Core Genes of Escherichia coli. Front Microbiol, 7, 1180.

  90. IC4R Project Consortium (2016) Information Commons for Rice (IC4R). Nucleic Acids Res, 44, D1172-1180.

  91. Zou D, Sun S, Li R, Liu J, Zhang J and Zhang Z (2015) MethBank: a database integrating next-generation sequencing single-base-resolution DNA methylation programming data. Nucleic Acids Res, 43, D54-58.

  92. Zou D, Ma L, Yu J and Zhang Z (2015) Biological databases for human research. Genomics Proteomics Bioinformatics, 13, 55-63.

  93. Zhang Y, Chen L and Zhang Z (2015) The Curation and Analysis of Rice Stress-Resistance Genes Based on RiceWiki. Hans Journal of Computational Biology, 5, 29-40.

  94. Ma L, Li A, Zou D, Xu X, Xia L, Yu J, Bajic V B and Zhang Z (2015) LncRNAWiki: harnessing community knowledge in collaborative curation of human long non-coding RNAs. Nucleic Acids Res, 43, D187-192.

  95. Bai B, Zhao W M, Tang B X, Wang Y Q, Wang L, Zhang Z, Yang H C, Liu Y H, Zhu J W, Irwin D M, Wang G D and Zhang Y P (2015) DoGSD: the dog and wolf genome SNP database. Nucleic Acids Res, 43, D777-783.

  96. Zhao Y, Jia X, Yang J, Ling Y, Zhang Z, Yu J, Wu J and Xiao J (2014) PanGP: a tool for quickly analyzing bacterial pan-genome profile. Bioinformatics, 30, 1297-1299.

  97. Zhang Z, Zhu W and Luo J (2014) Bringing biocuration to China. Genomics Proteomics Bioinformatics, 12, 153-155.

  98. Zhang Z, Sang J, Ma L, Wu G, Wu H, Huang D, Zou D, Liu S, Li A, Hao L, Tian M, Xu C, Wang X, Wu J, Xiao J, Dai L, Chen L L, Hu S and Yu J (2014) RiceWiki: a wiki-based database for community curation of rice genes. Nucleic Acids Res, 42, D1222-1228.

  99. Xu P, Zhang X, Wang X, Li J, Liu G, Kuang Y, Xu J, Zheng X, Ren L, Wang G, Zhang Y, Huo L, Zhao Z, Cao D, Lu C, Li C, Zhou Y, Liu Z, Fan Z, Shan G, Li X, Wu S, Song L, Hou G, Jiang Y, Jeney Z, Yu D, Wang L, Shao C, Song L, Sun J, Ji P, Wang J, Li Q, Xu L, Sun F, Feng J, Wang C, Wang S, Wang B, Li Y, Zhu Y, Xue W, Zhao L, Wang J, Gu Y, Lv W, Wu K, Xiao J, Wu J, Zhang Z, Yu J and Sun X (2014) Genome sequence and genetic diversity of the common carp, Cyprinus carpio. Nat Genet, 46, 1212-1219.

  100. Wu J, Xiao J, Zhang Z, Wang X, Hu S and Yu J (2014) Ribogenomics: the science and knowledge of RNA. Genomics Proteomics Bioinformatics, 12, 57-63.

  101. Wu H, Fang Y, Yu J and Zhang Z (2014) The quest for a unified view of bacterial land colonization. ISME J, 8, 1358-1369.

  102. Wu G, Zhu J, Yu J, Zhou L, Huang J Z and Zhang Z (2014) Evaluation of five methods for genome-wide circadian gene identification. J Biol Rhythms, 29, 231-242.

  103. Ma L, Cui P, Zhu J, Zhang Z and Zhang Z (2014) Translational selection in human: more pronounced in housekeeping genes. Biol Direct, 9, 17.

  104. Kang Y, Gu C, Yuan L, Wang Y, Zhu Y, Li X, Luo Q, Xiao J, Jiang D, Qian M, Ahmed Khan A, Chen F, Zhang Z and Yu J (2014) Flexibility and symmetry of prokaryotic genome rearrangement reveal lineage-associated core-gene-defined genome organizational frameworks. MBio, 5, e01867.

  105. Zhang Z and Yu J (2013) Does the genetic code have a eukaryotic origin? Genomics Proteomics Bioinformatics, 11, 41-55.

  106. Wu J, Xiao J, Wang L, Zhong J, Yin H, Wu S, Zhang Z and Yu J (2013) Systematic analysis of intron size and abundance parameters in diverse lineages. Sci China Life Sci, 56, 968-974.

  107. Tong X, Yang Y, Wang W, Bai Z, Ma L, Zheng X, Sun H, Zhang Z, Zhao M, Yu J and Ge R L (2013) Expression profiling of abundant genes in pulmonary and cardiac muscle tissues of Tibetan Antelope (Pantholops hodgsonii). Gene, 523, 187-191.

  108. Ma L, Bajic V B and Zhang Z (2013) On the classification of long non-coding RNAs. RNA Biol, 10, 925-933.

  109. Dai L, Xu C, Tian M, Sang J, Zou D, Li A, Liu G, Chen F, Wu J, Xiao J, Wang X, Yu J and Zhang Z (2013) Community intelligence in knowledge curation: an application to managing scientific nomenclature. PLoS One, 8, e56961.

  110. Dai L, Tian M, Wu J, Xiao J, Wang X, Townsend J P and Zhang Z (2013) AuthorReward: increasing community curation in biological knowledge wikis through automated authorship quantification. Bioinformatics, 29, 1837-1839.

  111. Chen M, Xiao J, Zhang Z, Liu J, Wu J and Yu J (2013) Identification of human HK genes and gene expression regulation study in cancer from transcriptomics data analysis. PLoS One, 8, e54082.

  112. Zhang Z and Yu J (2012) The pendulum model for genome compositional dynamics: from the four nucleotides to the twenty amino acids. Genomics Proteomics Bioinformatics, 10, 175-180.

  113. Zhang Z, Xiao J, Wu J, Zhang H, Liu G, Wang X and Dai L (2012) ParaAT: a parallel tool for constructing multiple protein-coding DNA alignments. Biochem Biophys Res Commun, 419, 779-781.

  114. Zhang Z, Li J, Cui P, Ding F, Li A, Townsend J P and Yu J (2012) Codon Deviation Coefficient: a novel measure for estimating codon usage bias and its statistical significance. BMC Bioinformatics, 13, 43.

  115. Wu H, Zhang Z, Hu S and Yu J (2012) On the molecular mechanism of GC content variation among eubacterial genomes. Biol Direct, 7, 2.

  116. Wu H, Qu H, Wan N, Zhang Z, Hu S and Yu J (2012) Strand-biased gene distribution in bacteria is related to both horizontal gene transfer and strand-biased nucleotide composition. Genomics Proteomics Bioinformatics, 10, 186-196.

  117. Dai L, Gao X, Guo Y, Xiao J and Zhang Z (2012) Bioinformatics clouds for big data manipulation. Biol Direct, 7, 43; discussion 43.

  118. Cui P, Liu W, Zhao Y, Lin Q, Zhang D, Ding F, Xin C, Zhang Z, Song S, Sun F, Yu J and Hu S (2012) Comparative analyses of H3K4 and H3K27 trimethylations between the mouse cerebrum and testis. Genomics Proteomics Bioinformatics, 10, 82-93.

  119. Cui P, Ding F, Lin Q, Zhang L, Li A, Zhang Z, Hu S and Yu J (2012) Distinct contributions of replication and transcription to mutation rate variation of human genomes. Genomics Proteomics Bioinformatics, 10, 4-10.

  120. Zhang Z and Yu J (2011) On the organizational dynamics of the genetic code. Genomics Proteomics Bioinformatics, 9, 21-29.

  121. Zhang Z, Bajic V B, Yu J, Cheung K-H and Townsend J P. (2011) In Mahdavi, M A (ed.), Bioinformatics - Trends and Methodologies. InTech, Rijeka, Croatia, Vol. 1, pp. 41-56.

  122. Zhang Z and Yu J (2010) Modeling compositional dynamics based on GC and purine contents of protein-coding sequences. Biol Direct, 5, 63.

  123. Zhang Z and Townsend J P (2010) The filamentous fungal gene expression database (FFGED). Fungal Genet Biol, 47, 199-204.

  124. Zhang Z, Lopez-Giraldez F and Townsend J P (2010) LOX: inferring Level Of eXpression from diverse methods of census sequencing. Bioinformatics, 26, 1918-1919.

  125. Wang D, Zhang Y, Zhang Z, Zhu J and Yu J (2010) KaKs_Calculator 2.0: a toolkit incorporating gamma-series methods and sliding window strategies. Genomics Proteomics Bioinformatics, 8, 77-80.

  126. Qu H, Wu H, Zhang T, Zhang Z, Hu S and Yu J (2010) Nucleotide compositional asymmetry between the leading and lagging strands of eubacterial genomes. Res Microbiol, 161, 838-846.

  127. Zhang Z and Townsend J P (2009) Maximum-likelihood model averaging to profile clustering of site types across discrete linear sequences. PLoS Comput Biol, 5, e1000421.

  128. Zhang Z, Cheung K H and Townsend J P (2009) Bringing Web 2.0 to bioinformatics. Brief Bioinform, 10, 1-10.

  129. Li J, Zhang Z, Vang S, Yu J, Wong G K and Wang J (2009) Correlation between Ka/Ks and Ks is related to substitution model and evolutionary lineage. J Mol Evol, 68, 414-423.

  130. Zheng H, Shi J, Fang X, Li Y, Vang S, Fan W, Wang J, Zhang Z, Wang W, Kristiansen K and Wang J (2007) FGF: a web tool for Fishing Gene Family in a whole genome database. Nucleic Acids Res, 35, W121-125.

  131. Zhao X, Zhang Z, Yan J and Yu J (2007) GC content variability of eubacteria is governed by the pol III alpha subunit. Biochem Biophys Res Commun, 356, 20-25.

  132. Hu J, Zhao X, Zhang Z and Yu J (2007) Compositional dynamics of guanine and cytosine content in prokaryotic genomes. Res Microbiol, 158, 363-370.

  133. Zhang Z and Yu J (2006) Evaluation of six methods for estimating synonymous and nonsynonymous substitution rates. Genomics Proteomics Bioinformatics, 4, 173-181.

  134. Zhang Z, Li J, Zhao X Q, Wang J, Wong G K and Yu J (2006) KaKs_Calculator: calculating Ka and Ks through model selection and model averaging. Genomics Proteomics Bioinformatics, 4, 259-263.

  135. Zhang Z, Li J and Yu J (2006) Computing Ka and Ks with a consideration of unequal transitional substitutions. BMC Evol Biol, 6, 44.

  136. Li H, Coghlan A, Ruan J, Coin L J, Heriche J K, Osmotherly L, Li R, Liu T, Zhang Z, Bolund L, Wong G K, Zheng W, Dehal P, Wang J and Durbin R (2006) TreeFam: a curated database of phylogenetic trees of animal gene families. Nucleic Acids Res, 34, D572-580.


  1. From data to theory: three laws of genome nucleotide composition. The 12th National Conference on Bioinformatics and System Biology of China, Qingdao, China, October 27-30, 2023.

  2. CNCB-NGDC database resources: from deposition to integration to translation. GPB Omics and Bioinformatics Frontiers Symposium 2023, Beijing, China, August 2-4, 2023.

  3. Towards a Universe of Human Long Non-coding RNAs. The 8th National Conference on Computational Biology and Bioinformatics (第八届全国计算生物学与生物信息学学术会议), Guangzhou, July 22-25, 2022.

  4. Biological Big Data Deposition, Public Sharing and Database Resources. The 10th National Conference on Bioinformatics and System Biology of China, Chengdu, October 25-29, 2021.

  5. Database Resources of the National Genomics Data Center. The 9th National Conference on Bioinformatics and System Biology of China, Shanghai, September 26-29, 2020.

  6. Computational genomics of brain tumors: glioma biomarker identification and characterization through multi-omics integrative molecular profiling. KAUST Research Conference Digital Health 2020, Thuwal, Saudi Arabia, January 20-22, 2020.

  7. Building a big data ecosystem for bioinformatics. The 4th Annual European Bioinformatics Core Community Workshop, Basel, Switzerland, 26 July 2019.

  8. 生命组学大数据管理与资源体系建设. 中国科学院青年创新促进会第一届青年生命科学论坛, 乌鲁木齐, 2019年7月12-16日.

  9. 生命组学数据资源体系建设. 第六届全国计算生物学与生物信息学学术会议, 成都, 2019年3月30-31日.

  10. Database Resources of the BIG Data Center in 2018. International Conference on Precision Medicine, Bangkok, Thailand, 19-20 July 2018.

  11. 基因大数据汇交共享与多组学数据资源体系建设. 北京生物信息学系列论坛, 北京大学, 2018年1月19日

  12. The BIG Data Center: from deposition to integration to translation. The Sino-ASEAN Conference on Precision Medicine, Burapha University, Thailand, 17-18 June 2017.

  13. The BIG Data Center’s database resources: towards precision medicine . The 10th International Biocuration Conference, Stanford University, United States, 26-29 March 2017.

  14. The BIG Data Center: from deposition to integration to translation. The 4th Youth Forum for Computer & Life Sciences Interdisciplinary Research (第四届数学、计算机与生命科学交叉研究青年学者论坛), Beijing, China, May 21-22, 2016.

  15. The BIG Data Center for Life and Health Sciences. The Phoenix City Forum for Genome Informatics (基因组信息学凤凰城论坛), Tangshan, China, May 13-15, 2016.

  16. The BIG Data Center: from deposition to integration to translation. The 9th International Biocuration Conference, Geneva, Switzerland, April 10-14, 2016.

  17. Big Data integration: scalability and sustainability. KAUST Research Conference on Computational and Experimental Interfaces of Big Data and Biotechnology, Thuwal, Saudi Arabia, Jan. 25-27, 2016.

  18. Toward Sustainability and Scalability for Big Data Integration. 2015 Functional Genomics Summit II, Beijing, China, Nov. 11-12, 2015.

  19. Community integration of big data. The 2015 International Conference of Genomics, Xi'an, China, Oct. 22-25, 2015.

  20. Bringing biocuration to China. The 1st International Coastal Biology Congress, Yantai, China, Sept. 26-30, 2014.

  21. Big data integration, curation, and analysis. The 3rd Young Bioinformatics PI Workshop, Guangzhou, China, Sept. 19-21, 2014.

  22. Biocuration in the era of big data. The Workshop on Statistical and Computational Theory and Methodology for Big Data Analysis, Banff, Canada, Feb. 9-14, 2014.

  23. Community curation in biological knowledge wikis. The 6th International Biocuration Conference, Churchill College, Cambridge, United Kingdom, April 7-10, 2013.

  24. Rewarding community-curated contributions in biological knowledge wikis. The High-throughput Sequencing Data Analysis and Approaches Workshop, Beijing, China, Dec. 5-7, 2012.

  25. Next-Generation Bioinformatics: harnessing collective resources for large-scale data manipulation, in workshop “Challenges and future of Bioinformatics: sharing insights from the Dutch and Chinese perspective”, Institute of Vegetables and Flowers, Chinese Academy of Agricultural Sciences, Beijing, China, March 19, 2012.


  • Nothing in biology makes sense except in the light of evolution.  ─Theodosius Dobzhansky

  • I don't think we can get a Nobel prize by what we are doing, but the Nobel prize winners know what we are doing for.  ─Alan Bleasby

  • He who loves practice without theory is like the sailor who boards ship without a rudder and compass and never knows where he may cast. ─Leonardo da Vinci