The 9th Big Data Forum for Life and Health Sciences ( 2024/10/16 - 2024/10/18)

Biological research has entered the era of big data, including a wide variety of omics data and covering a broad range of health data. Such big data is generated at ever-growing rates and distributed throughout the world with heterogeneous standards and diverse limited access capabilities. However, the promise to translate these big data into big knowledge can be realized only if they are publicly shared. Thus, providing open access to omics & health big data is essential for expedited translation of big data into big knowledge and is becoming increasingly vital in advancing scientific research and promoting human healthcare and precise medical treatment.
It is our great pleasure to announce that the 2024 Big Data Forum for Life and Health Sciences will be held on October 16-18, 2024. A few renowned biomedical data scientists have agreed to give speeches. Likely, you are also cordially invited to share your work and participate in this exciting event.

Organizing Committee

Zhang Zhang (Chair, BIG, CAS)
Yiming Bao (BIG, CAS)
Wenming Zhao (BIG, CAS)
Jingfa Xiao (BIG, CAS)
Songnian Hu (Institute of Microbiology, CAS)
Jun Yu (BIG, CAS)
Jingchu Luo (Peking University)

Previous Conferences

Invited Speakers

Professor
Head of Protein Sequence Resources
EMBL-EBI
United Kingdom
Associate Professor
National Center for Protein Sciences (Beijing)
China
Professor
Nanjing Universtiy
China
Dr.
Lead Scientist, Computational Biology Branch
NCBI, NLM, National Institutes of Health
USA
Associate Professor
Tsinghua University
China
Professor
Group leader
HUN-REN Research Center for Natural Sciences
Hungary
Professor
China National Center for Bioinformation
China
Professor
Peking University
China
Ph.D candidate
Peking University
China
Professor
Beijing Institute of Basic Medical Sciences
China
Professor
Tsinghua University
China
Professor
Academy of Mathematics and Systems Science, Chinese Academy of Sciences
China
Professor
National Center for Protein Sciences (Beijing)
China

Agenda

10/16 Wednesday 09:00 ~ 12:00
Keynote: Alex Bateman
Session 1: Data integration and curation, chaired by Cheng Chang and Shuhui Song
- 植物调控组:大数据平台及AI模型开发与应用
Dijun Chen, Nanjing Universtiy
[Speaker introduction]

陈迪俊,南京大学副教授、博士生导师,江苏省特聘教授、南京大学登峰B人才支持计划入选者。2003-2008年在哈尔滨医科大学接受本科教育并获得生物信息学学士学位,2017年在德国哈雷-维滕贝格大学获得博士学位,先后在浙江大学、波茨坦大学和洪堡大学从事研究工作,2019年底入职南京大学,担任生物信息学课题组PI,主要研究方向是功能基因组学和人工智能生物学。共发表学术论文60余篇,参与编写专著5部,其中以第一或者通讯(含共同)作者在Nat Commun (5篇)、Nat Neurosci等主流期刊发表论文近30篇。

[Abstract]

植物基因组含有大量复杂且演化多样的非编码调控序列,这些序列遵循着一些尚未被揭示的语法规则。在过去的十年中,高通量测序技术的广泛应用在多个模式植物中积累了丰富的调控组学数据。同时,自然语言人工智能方法的突破为系统研究植物基因组的序列语法和功能提供了新的机遇。在这一背景下,我们开发了ChIP-Hub,这是一个实现ENCODE标准的综合数据库,处理了来自多个模式植物物种的超过16,000个公开发表的数据集(来源于近600篇文献)。基于这些数据,我们系统注释了植物组织和细胞水平上的特异性调控元件(如启动子和增强子)及相关基因调控网络。同时,我们开发了比较基因组学方法分析了组织特异性调控元件及染色质状态的功能保守性。最后,我们提出了SeiPlant模型,该模型利用残差连接的空洞卷积对线性和非线性序列特征进行学习,并结合空间基函数层以提高拓展性,可有效整合大规模植物调控组学数据。基于该预训练模型,只利用基因组序列信息就可实现跨物种调控组学数据的精准预测,为调控组学知识匮乏且很难获取的物种(如小麦或者其他新物种)提供宝贵的参考。总的来说,ChIP-Hub平台、AI预测模型及相关分析结果为深入探索植物基因组功能提供了丰富的数据资源。

- 蛋白质组数据集交换信息标准
Yunping Zhu, National Center for Protein Sciences (Beijing)
[Speaker introduction]

朱云平,军事科学院军事医学研究院生命组学研究所研究员,博士生导师,国家蛋白质科学中心生物信息学实验室PI。曾任中国医药生物技术协会生物医学信息技术分会副主任委员,863重大项目首席专家。研究方向为蛋白质组生物信息学、生物医学大数据挖掘。研发了蛋白质组信息学系列算法,建立的iProX是国际学术界公认的蛋白质组数据共享发布平台。发表SCI论文两百余篇;获软件著作权三十余项,中国发明专利11项。获中国电子信息科学技术奖一等奖、中国发明协会发明创新奖一等奖,以及国家科技进步创新团队奖、北京市科学技术奖一等奖、中华预防医学科技奖一等奖、军队教学成果一等奖等。

[Abstract]

蛋白质组研究的快速发展产生了海量的实验数据,如何有效地管理和共享蛋白质组学数据集是目前面临的重大挑战之一。海量蛋白质组学数据交换共享的难点在于缺乏标准信息规范,在满足数据集高效快捷共享的同时提供足够的实验和生物学相关信息。人类蛋白质组组织(Human Proteome Organization,HUPO)所属的蛋白质组标准计划(Proteomics Standards Initiative,PSI)是国际上蛋白质组领域最权威的数据标准和信息指南制定和发布的组织。从2002年至今,PSI组织从蛋白质组最低信息准则、数据格式、控制词汇表、数据资源和分析软件五个方面组织研究并交付成果。但目前国内外还没有针对基于质谱的蛋白质组学实验数据集在不同实验室、数据平台之间的交换制定相关标准。此外,蛋白质组学公共数据集的重用正变得越来越流行,而数据集重用在很大程度上仅限于与肽段和蛋白质的鉴定相关的基准研究和应用。公共定量实验的广泛重用受到缺乏样本和实验元数据的限制,导致不能将每个数据集中的数据文件与样本明确地关联起来。本研究制定了基于质谱的蛋白质组学数据集交换标准,对数据集交换过程中涉及的数据集元数据、实验元数据和实验数据文件规范信息进行了详细的描述,将是首个蛋白质组学数据集交换的信息标准,可以作为本领域未来发展的基础,实现蛋白质组实验数据集的高质量发布、交换和共享,促进蛋白质组学实验数据集的重用。

- 中国慢性病前瞻性研究(CKB项目)及全表型组关联研究实例
Dianjianyi Sun, Peking University
[Speaker introduction]

孙点剑一,北京大学公共卫生学院流行病与卫生统计学系副研究员、博士生导师;现任中国慢性病前瞻性研究(CKB项目)国家项目办主任兼现场部负责人,中国健康促进与教育协会-肥胖防治分会副主任委员;主持或参与重点研发计划(3项)、国自然青年(1项)及专项项目(2项)等多项国家级课题,累计发表学术论文210余篇,H-index为33分,主要研究方向为心脑血管代谢性疾病流行病学、多组学分析、衰老及生命历程研究。

[Abstract]

中国慢性病前瞻性研究(China Kadoorie Biobank, CKB)通过建立中国成年人的自然人群队列及基于生物样本的健康数据库,整合表型组、暴露组以及基因组等微观多组学信息,利用系统流行病学研究策略,深入研究危害中国人群健康的各类重大慢性病的流行规律和趋势、主要致病因素和保护性因素,探讨致病机制,为制定符合我国国情的慢性病防控策略和措施、精准预防指南、以及开发新的治疗和干预手段提供中国人群的科学依据。全表型组关联(Phenome-wide Association Study, PheWAS)研究实例以CKB项目招募的10个地区共50余万人的基线及长期随访数据为基础:(1)建立了婚姻与中国成人PheWAS图谱,并分析了婚姻与一般人群重要疾病发生风险、患病人群全死因死亡风险的关联;(2)建立了生育与中国成人全表型组关联图谱,并分析了生育与一般人群重要疾病发生风险、患病人群全死因死亡风险的关联;(3)基于九个假设场景,探索了婚姻对总人群2021~2040年全死因死亡负担的影响。

09:00 - 09:45 TBD
Alex Bateman, EMBL-EBI
10/16 Wednesday 13:30 ~ 17:30
Keynote: Xuegong Zhang
Session 2: Computational methods and AI models, chaired by Xin Li and Zhihua Zhang
- Keynote talk: From big data, large foundation models to digital life
Xuegong Zhang, Tsinghua University
10/17 Thursday 09:00 ~ 12:00
Keynote: Balázs Győrffy
Session 3: Genomics for precision medicine, chaired by Guihui Ding and Peilin Jia
- Keynote talk: Discovery and validation of new diagnostic, predictive, and prognostic biomarkers in cancer research
Balázs Győrffy, HUN-REN Research Center for Natural Sciences
10/17 Thursday 13:30 ~ 17:30
Keynote:
Session 4: Youth Innovation Promotion Association CAS, chaired by Jingyao Zeng and Shuai Jiang
10/18 Friday 09:00 ~ 12:00
Keynote:
Session 5: RNA (Single-cell) omics, chaired by Xiaomin Ying and Lina Ma
- TBD
Junchao Shi, China National Center for Bioinformation
- Intelligent spatial transcriptomics: methods and applications
Shihua Zhang, Academy of Mathematics and Systems Science, Chinese Academy of Sciences
[Speaker introduction]

Shihua Zhang received a Ph.D. degree in applied mathematics and bioinformatics from the Academy of Mathematics and Systems Science, Chinese Academy of Sciences in 2008. He joined the same institute as an Assistant Professor in 2008 and is currently Professor. His research interests are mainly in bioinformatics and computational biology, machine learning, and deep learning. His main papers have been published in top journals including Cell, National Computational Biology, Nature Communications, Advanced Science, Genome Research, National Science Review, Cell Reports, Nucleic Acids Research, JMLR, IEEE TPAMI, IEEE TKDE, IEEE TNNLS, etc. His papers have been selected as “Top Ten Bioinformatics Advances in China” in 2021 and 2022, respectively. He won the Youth Science and Technology Award of China, Ten Thousand Talent Program—Leading Talents (in Scientific and Technological Innovation) and Young Top-notch Talent; NSFC Excellent Youth Fund, CAS Outstanding Young Investigator Program, Lu Jia-Xi Young Investigator Award, Zhongchuang Software Talent Award, National Excellent Ph.D. Thesis Award and so on. Now he serves as a Section Editor of PLOS Computational Biology and an Editor for Genomics, Proteomics & Bioinformatics, etc.

[Abstract]

Technological advances in spatial transcriptomics are critical for better understanding the structures and functions of tissues in biological research. The combination of intelligent or statistical algorithms and spatial transcriptomics has emerged to pave the way for deciphering tissue architecture. We have made great efforts to advance intelligent spatial transcriptomics and developed a group of STA- tools. For example, we created a graph attention auto-encoder tool STAGATE to identify spatial domains by learning low-dimensional latent embeddings via integrating spatial information and gene expression profiles. Second, we introduced STAligner for integrating and aligning ST datasets across different conditions, technologies, and developmental stages to enable spatially-aware data integration, simultaneous spatial domain identification, and downstream comparative analysis. Third, we designed STAMarker for identifying spatially domain-specific variable genes with saliency maps in deep learning. Fourth, we developed a spatial location-supervised auto-encoder generator STAGE for generating high-density spatial transcriptomics. Fifth, we developed STASCAN for deciphering fine-resolution cell-distribution maps in spatial transcriptomics.

- Deciphering the tumor cell state dynamics under drug treatment by a time-context graph model
Jin Gu, Tsinghua University
[Speaker introduction]

Jin GU, Ph.D. Associate Professor at the Institute of Information Processing, Department of Automation, Tsinghua University. His lab aims at developing machine learning methods to explore the big biomedical data, especially diverse omics data, for precision & personalized medicine. Recently, he is interested in finding biomarkers and therapeutic targets and building predictive models of cancer by spatiotemporal omics. He is the editorial board member of Genomics Proteomics Bioinformatics and BMC Bioinformatics, and serves as the member of several professional committees of CAA and CAAI.

[Abstract]

Gemcitabine is commonly used for pancreatic ductal adenocarcinoma (PDAC), one of the most lethal cancer types. However, the drug resistance is a critical challenge for improving the PDAC chemotherapy. Here, we applied single-cell RNA sequencing (scRNA-seq) on PDAC patient-derived xenograft (PDX) models to study the complex cellular responses related to the gemcitabine resistances. To reconstruct dynamic tumor cell responses from these static scRNA-seq snapshots, we proposed scConGraph, a scalable bi-layer graph model that can efficiently integrate cross-time context information. Based on scConGraph, we observed that stemness and endoplasmic reticulum stress contribute to intrinsic resistance. As for acquired resistance, cancer cells may resist or evade gemcitabine treatment by activating the cell cycle, entering quiescence, or inducing epithelial-mesenchymal transition. Notably, GDF15 exhibited recurrent and significant upregulations among acquired-resistance cell subpopulations. Experimental validation confirmed that inhibiting GDF15 sensitizes tumor cells to gemcitabine, suggesting a potential target for gemcitabine-induced chemoresistance.

- Pan-cancer single-cell dissection reveals phenotypically distinct B cell subtypes
Yu Yang, Peking University
[Speaker introduction]

Yu Yang, PhD candidate Prof Zemin Zhang's Lab Biomedical Pioneering Innovation Center (BIOPIC) Peking University China

[Abstract]

Characterizing the compositional and phenotypic characteristics of tumor-infiltrating B cells (TIBs) is important for advancing our understanding of their role in cancer development. Here, we establish a comprehensive resource of human B cells by integrating single-cell RNA sequencing data of B cells from 649 patients across 19 major cancer types. We demonstrate substantial heterogeneity in their total abundance and subtype composition and observe immunoglobulin G (IgG)-skewness of antibody-secreting cell isotypes. Moreover, we identify stress-response memory B cells and tumor-associated atypical B cells (TAABs), two tumor-enriched subpopulations with prognostic potential, shared in a pan-cancer manner. In particular, TAABs, characterized by a high clonal expansion level and proliferative capacity as well as by close interactions with activated CD4 T cells in tumors, are predictive of immunotherapy response. Our integrative resource depicts distinct clinically relevant TIB subsets, laying a foundation for further exploration of functional commonality and diversity of B cells in cancer.

- TBD
Xiaomin Ying, Beijing Institute of Basic Medical Sciences
10/18 Friday 13:30 ~ 17:30
Keynote: Michael Galperin
Session 6: Biodiversity and Health genomics, chaired by Weifeng Shi and Minxian Wang
- Keynote talk: COG database update: principles and challenges
Michael Galperin, NCBI, NLM, National Institutes of Health