Conference - 2022 Big Data Forum for Life and Health Sciences

The 7th Big Data Forum for Life and Health Sciences (October 26, 2022)

Biological research has entered the era of big data, including a wide variety of omics data and covering a broad range of health data. Such big data is generated at ever-growing rates and distributed throughout the world with heterogeneous standards and diverse limited access capabilities. However, the promise to translate these big data into big knowledge can be realized only if they are publicly shared. Thus, providing open access to omics & health big data is essential for expedited translation of big data into big knowledge and is becoming increasingly vital in advancing scientific research and promoting human healthcare and precise medical treatment.

腾讯会议ID: 638-103-117

腾讯会议链接: https://meeting.tencent.com/dm/0ZZiM7u1fTao

直播间: https://live.bilibili.com/22690054

It is our great pleasure to announce that the 2022 Big Data Forum for Life and Health Sciences will be held on October 26, 2022. A few renowned biomedical data scientists have agreed to give speeches. Likely, you are also cordially invited to share your work and participate in this exciting event.

Organizing Committee

Zhang Zhang (Chair, BIG, CAS)

Yiming Bao (BIG, CAS)

Wenming Zhao (BIG, CAS)

Jingfa Xiao (BIG, CAS)

Songnian Hu (Institute of Microbiology, CAS)

Jun Yu (BIG, CAS)

Jingchu Luo (Peking University)

中国生物信息学学会(筹) 生物数据资源专委会

北京生物信息学研究会

Previous Conferences

Invited Speakers

Yiming Bao

Professor
National Genomics Data Center
Beijing Institute of Genomics, CAS / China National Center for Bioinformation
China

Zhiwei Cao

Professor
School of Life Sciences
Fudan University
China

Xuepeng Chen

Principle Investigator
Guangzhou Lab
China

David Lipman

Former Director of NCBI
Senior science advisor for bioinformatics and genomics at FDA
USA

Ran Su

Associate Professor
Tianjin University
China

Xiaowo Wang

Professor
Department of Automation
Tsinghua University
China

Liguo Wang

Associate Professor
Division of Computational Biology
Mayo Clinic
USA

Juan Xu

Professor
College of Bioinformatics Science and Technology
Harbin Medical University
China

Agenda (Online Meeting)

October 26
09:00 - 09:20	Welcome and Opening Remarks (with group photo), chaired by Yiming Bao
Session 1: Big Data Challenges & Opportunities, chaired by Yiming Bao
09:20 - 10:00	Keynote talk: Most Big Tech projects fail - why do some succeed? [Abstract] Modern technology - especially IT - has impressive capabilities and this has led to massive investments in big tech projects. Despite this, most of these efforts do not achieve their larger goals of improved productivity, health, and quality of life. I will review some of the evidence for this and then use the example of the application of whole genome sequencing to food safety to consider the attributes of these projects that improve the chances for success. David Lipman, FDA, USA [Bio] Dr. David Lipman joined FDA/CFSAN in 2021 as Senior Science Advisor in Bioinformatics and Genomics. Dr. Lipman worked at the National Institutes of Health for 36 years and served as the founding Director of the National Center for Biotechnology Information (NCBI) at the National Library of Medicine/NIH. Under Dr. Lipman’s leadership, NCBI created and managed a number of the most heavily used biomedical information resources including PubMed, PubMed Central, GenBank, SRA, and RefSeq. The Pathogen Detection team at NCBI also works closely with FDA on GenomeTrakr. From 2017 through 2019, Dr. Lipman served as Chief Science Officer for Impossible Foods and directed the team that developed and commercialized the current Impossible Burger formulation, the Impossible Whopper, and the sausage and ground pork formulations. Dr. Lipman’s own research has focussed on molecular evolution, molecular epidemiology, comparative genomics, and the development of computational tools including BLAST. His research papers have been cited by over 200,000 scientific publications. Dr. Lipman is an elected member of the National Academy of Sciences, the National Academy of Medicine, and the American Academy of Arts and Sciences
10:00 - 10:40	Keynote talk: Basic Scientific and Engineering Work of BioMedicine Big Data for Life, Health and Clinic Research[Abstract] Five years’ endeavor has been focused on the establishment of a scientific governance and engineered service system for BioMedicine Big Data. Its application in life, health, and clinic research are demonstrated in cases of supporting the Human Phenome Project and GWAS studies employing data and information from health-check populations as well as the multi-omics research of NSCLC-ICI clinic trials. Guoping Zhao, SINH, CAS, Fudan University and Shandong University[Bio] 赵国屏，分子微生物学家，中国科学院院士，第三世界科学院院士，美国微生物科学院院士。1948年8月出生于上海市。 1982年获复旦大学微生物学学士，1990年获美国普度大学生物化学博士。中国科学院上海营养与健康研究所生物医学大数据中心首席科学家，中科院分子植物科学卓越创新中心合成生物学重点实验室专家委员会主任，复旦大学生命科学学院微生物学和免疫学主任；兼任中国生物工程学会合成生物学专业委员会名誉主任，上海生物工程学会名誉理事长。多年从事微生物生理生化、代谢调控及酶作用机理的研究。参与组织中国人类基因组计划，建立基因组学研究平台；启动生物芯片、生物信息学、蛋白质组学研究工作。在微生物基因组、代谢酶乙酰化组、肠道微生物组研究方面，做出开创性工作。开拓系统合成生物学研究领域，创建中科院合成生物学重点实验室，在天然化合物人工细胞工厂合成、单染色体酵母构建与CRISPR-Dx体系创建方面作出基础性贡献。近年来，参与组建并领导中国科学院上海生命科学研究院（现营养与健康研究所）生物医学大数据中心，为国家生物医学大数据综合性服务平台的建设开展预研工作。
Session 2: Data Analysis & Database, chaired by Jingfa Xiao
10:40 - 11:00	CAZyme bioinformatics enhances gut microbiome, agriculture, and bioenergy research [Abstract] Carbohydrate Active EnZymes (CAZymes) are enzymes that act upon specific glyosidic linkages to degrade, synthesize or modify carbohydrates/glycans. CAZymes are significantly important for microbial communities to thrive in carbohydrate rich environments such as animal guts, agricultural soils, forest floors, and ocean sediments. We maintain a popular web server dbCAN, which was published in Nucleic Acids Research’s web server special issues in 2012 and 2018, and has been cited 2,400 times. dbCAN contains a hidden Markov model (HMM) database for 400+ CAZyme families and allows automated CAZyme anntotion at the protein family level for any input genome and metagenome. In this talk, I will introduce our further development of dbCAN to enable glycan substrate prediction, which is extremely important to research in bioenergy, gut microbiome, human health, nutrition, plant disease, and global carbon recycling. Yanbin Yin, University of Nebraska Lincoln, USA [Bio] 尹燕斌，美国内布拉斯加州林肯大学食品科学和工程系副教授。博士毕业于北京大学生物信息中心罗静初教授和李松岗教授门下，博士后受训于佐治亚大学徐鹰教授实验室。自2007年起专注于碳水化合物活性酶（CAZyme）的基因组数据挖掘，开发的dbCAN系列工具在肠道健康，生物能源，农业微生物，全球碳循环，海洋生态等领域有广泛的应用。另外在植物细胞壁的进化和细菌orphan基因的进化方向也有成果发表。 2017年获美国自然科学基金NSF CAREER 奖。近年来的研究受NIH R01资助，集中在开发新的计算工具，来促进实现基于肠道微生物的膳食纤维的个体化营养（dietary fiber recommendation for personalized nutrition）。
11:00 - 11:20	ESPRESSO: Robust discovery and quantification of transcript isoforms from error-prone long-read RNA-seq data [Abstract] Long-read RNA sequencing (RNA-seq) holds great potential for characterizing transcriptome variation and full-length transcript isoforms, but the relatively high error rate of current long-read sequencing platforms poses a major challenge. We present ESPRESSO, a computational tool for robust discovery and quantification of transcript isoforms from error-prone long reads. ESPRESSO jointly considers alignments of all long reads aligned to a gene and uses individual reads’ error profiles to improve splice junction identification. On a synthetic spike-in RNA sample with known transcript structures and concentrations, ESPRESSO generates more accurate quantifications of transcript isoforms compared to multiple contemporary tools. We used ESPRESSO to assess three popular nanopore RNA-seq protocols with respect to transcript isoform analysis. In total, we generated and analyzed ~1.1 billion nanopore RNA-seq reads covering 30 human tissue samples and three human cell lines. ESPRESSO and its companion dataset provide a useful resource for studying the RNA repertoire of eukaryotic transcriptomes. Yuan Gao, Beijing Institute of Genomics, CAS / China National Center for Bioinformation [Bio] 高远，中国科学院北京基因组研究所研究员、博士生导师，聚焦于生物组学数据的关键特征识别与精确定量算法研发，致力于通过创新生物信息学手段对复杂生物学问题进行探索。近年来围绕着转录调控、微生物生长动力学等科学问题，以第一或共同第一作者在Nature Methods、Nature Communications、Trends in Genetics、Genome Biology 及Briefings in Bioinformatics等国际重要刊物发表学术论文。 2021年回国建立实验室后，作为项目负责人获得国家重点研发计划青年科学家项目和自然科学基金面上项目的资助。
11:20 - 11:40	Capture biological associations through genomic collocation analysis[Abstract] Collocated genomic intervals indicate biological associations. The commonly used approach to evaluate the strength of collocation between genomic interval sets first applies an arbitrary threshold to define the overlapping of two intervals, then counts the number or proportion of overlapped genomic regions. Such a threshold-and-count approach ignores the magnitude of overlapping, which constantly leads to inaccurate, biased, non-reproducible, and incomparable results. Here, we introduce the cobind package that provides six threshold-free metrics to measure the magnitude of genomic collocations. By applying these new metrics to genomic intervals identified by ChIP-seq, we demonstrated that the collocation coefficient (C) and normalized pointwise mutual information (NPMI) are the best measures to quantify genomic collocations. These two metrics successfully nominated CTCF’s co-factors that have been missed by the conventional threshold-and-count approach. Furthermore, our methods effectively identified known and novel master regulators from prostate cancer- and oligodendrocyte-specific cis-regulatory regions identified by bulk and single-cell ATAC-seq, respectively. Liguo Wang, Mayo Clinic, USA [Bio] 王利国，副教授。博士毕业于中科院北京基因组研究所（导师：于军教授）。2008-2012年在美国贝勒医学院（Baylor College of Medicine）李蔚教授实验室从事博士后研究工作。2012年加入美国梅奥医学中心(Mayo Clinic， Rochester)并工作至今。研究领域包括生物信息算法和工具开发(比如RSeQC, CPAT, CrossMap, CpGtools等)，基因组、表观基因组数据挖掘。研究涉及的人类疾病包括癌症（前列腺癌，肺癌，中枢神经系统肿瘤），偏头痛（migraine），渐冻症（Amyotrophic lateral sclerosis），肝病，以及加速衰老（accelerated aging）。发表论文80余篇(Google H index = 42)。
11:40 - 12:00	The latest development of Ori-Finder system and DoriC database for prokaryotic replication origins[Abstract] The replication of DNA is a complex biological process that is essential for life. Bacterial DNA replication is initiated at genomic loci referred to as replication origins (oriCs). Integrating the Z-curve method, DnaA box distribution, and comparative genomic analysis, we developed a web server to predict bacterial oriCs in 2008 called Ori-Finder, which contributes to clarify the characteristics of bacterial oriCs. Here, we describe Ori-Finder 2022 with updated prediction framework, interactive visualization module, new analysis module, and user-friendly interface. More species-specific indicator genes and functional elements of oriCs are integrated into the updated framework, which has also been redesigned to predict oriCs in draft genomes. The interactive visualization module displays more genomic information related to oriCs and their functional elements. The analysis module includes regulatory protein annotation, repeat sequence discovery, homologous oriC search, and strand-biased analyses. The redesigned interface provides additional customization options for oriC prediction. Ori-Finder 2022 is freely available at https://tubic.org/Ori-Finder/. At the same time, we also launched DoriC, a manually curated database of oriCs in prokaryotic genomes predicted with the Ori-Finder system and has since been constantly updated to integrate the latest research progress in this field. This database has facilitated large-scale data mining of functional elements in oriCs and strand-biased analysis. Based on the predictions of Ori-Finder 2022, the DoriC database has been updated accordingly. This latest release, DoriC 12.0, includes the oriCs in both draft and complete prokaryotic genomes. The number of oriCs in the database has also increased significantly and currently contains over 200,000 bacterial entries distributed in more than 40 phyla. Additionally, new oriC features and improvements have been introduced, especially in the visualization and analysis of oriCs. Currently, DoriC is considered as an important database in the fields of bioinformatics, microbial genomics, and even synthetic biology, providing a valuable resource as well as a comprehensive platform for the research on oriCs. DoriC 12.0 can be accessed at https://tubic.org/doric/. Feng Gao, Tianjin University [Bio] 高峰，天津大学教授、博导，天津大学生物信息中心主任，主要从事微生物基因组生物信息学与合成生物学研究。在国际知名SCI刊物上发表第一/通讯作者论文60篇（Nucleic Acids Research、Briefings in Bioinformatics和Bioinformatics 系列20余篇），其中45篇为唯一第一/通讯作者论文。第一/通讯作者论文获Science、Nature等论文引用并佐证，相关成果得到中央电视台、《科技日报》、《人民日报》等国家级媒体报道。现任Faculty Opinions专家，Genomics Proteomics & Bioinformatics、 Briefings in Bioinformatics、Interdisciplinary Sciences: Computational Life Sciences等十余个国际期刊的编委/副主编。
12:00 - 12:20	Database Resources of CNCB-NGDC Yiming Bao, Beijing Institute of Genomics, CAS / China National Center for Bioinformation
Session 3: Big Data & Diseases, chaired by Shuhui Song
13:00 - 13:40	Keynote talk: 元宇宙与智慧大健康[Abstract] 2021年，“元宇宙”已经成为中国民众的一个年度热词。因为是一个新概念，不同的人心里有不一样的元宇宙的模样。甚至也有人说元宇宙是骗人的，是伪科学。不过我认为既然有这么多的公司和专家在追捧这个概念，肯定有它存在的价值，所以我们必须先了解和研究它，再做决定来肯定、或否定、还是修改完善它。随着元宇宙的持续成熟，大家正在达成一些基本的共识。本报告将介绍元宇宙的基本概念，三个阶段，发展历史，支撑技术，八大基本特征，五大要素，和六大特性，并阐述元宇宙在生物医学场景中的应用及实现中的挑战，其中包括医疗培训、医疗手术、电子处方、游戏疗法、数字药物、疾病诊断、虚拟现实疗法、远程咨询、元宇宙生物实验等应用。 Yi Pan, Shenzhen Institute of Advanced Technology, CAS [Bio] 潘毅教授1977年以江苏省理科状元考入清华大学计算机科学与工程系，1982年和1984年分别获得清华大学工学学士学位和硕士学位， 1991年获得美国匹兹堡大学计算机科学博士学位。目前担任中国科学院深圳理工大学（筹）计算机科学与控制工程学院院长、讲席教授，并且是美国乔治亚州立大学终身州校董荣誉退休教授。他曾是美国乔治亚州立大学计算机科学系主任、生物系主任、文理学院副院长、校级杰出教授、州校董教授（公立大学教授最高荣誉），并担任过清华大学、北京大学、浙江大学等高校访问讲席教授或客座教授。潘毅教授是美国医学与生物工程院院士、英国皇家公共卫生学院院士、乌克兰国家工程院外籍院士、英国工程技术学会会士、日本学术振兴会会士、长江学者讲座教授和国家特聘专家。他于2021年入选全球前2%顶尖科学家榜单;2021年入选世界1000名顶尖计算机科学家榜单，他是我国唯一一位入选的生物信息学领域专家。潘毅教授的主要研究领域是以云计算、大数据分析、人工智能、深度学习等为工具，进行生物信息和医疗信息的研究。在此领域已发表250多篇SCI期刊学术论文，其中100多篇发表在顶尖的IEEE/ACM Transactions/Journals学术期刊上；另在国际学术会议录上发表150多篇学术论文，出版编著了42本书。他的学术成果已被引用19600，目前H-index为89。他已培养了20多名博士， 50多名硕士。他是IEEE杰出演讲者，曾获得IEEE杰出成就奖，IEEE杰出服务奖，IEEE Transactions 最佳论文奖，多次获IEEE等国际大会最佳论文奖，四次获得IBM教授奖，两次获得日本学术振兴会高级邀请奖，安得鲁•梅隆奖等奖项。应邀在60多个国际大会上作了大会主题演讲，并在美国和许多世界著名大学作了近百个学术报告。潘毅教授现任《Big Data Mining and Analytics》（清华大学与IEEE共同发行，中国Top 5%杂志，国际top 4%杂志）主编、中国顶尖计算机类英文杂志《Journal of Computer Science and Technology (JCST)》副总主编、中国顶尖电子类英文杂志《Chinese Journal of Electronics》副总主编。曾任国际顶尖生物信息杂志《IEEE/ACM Transactions on Computational Biology and Bioinformatics》副总主编、John-Wiley《生物信息学系列丛书》与John-Wiley《无线网络和移动计算系列丛书》的创办人兼主编和七种IEEE Transactions期刊副编辑。已在几十个重大国际大会上任大会总主席和程序委员会主席。
13:40 - 14:00	RWD-driven smart pediatric CDSS[Abstract] The Pediatric Clinical Decision Support System (CDSS) based on real-world data utilizes massive EMR data to support auxiliary diagnosis through AI models. Medical diagnosis emphasizes "seeing, hearing, asking, and knowing", which requires doctors to obtain information through multi-sensory and multi-channel information, make comprehensive judgments based on the patient's simple main complaint, as well as the patient's age, physical condition, and past medical history, and gradually integrate from the initial query system with single function and independent relationship. In this clinical support systems, it focuses on two points: one is data-driven, which effectively utilizes the health data generated in the medical process; the other is specialization, which uses high-quality data from professional departments to form CDSS that fits the fine scene. Lianyi Han, Fudan University & Greater Bay Area Institute of Precision Medicine [Bio] 韩涟漪博士，教授，博导。2022年起任粤港澳大湾区精准医学研究院（广州）高级研究员，复旦大学生命科学学院双聘研究员。原美国国家卫生院资深科学家，腾讯美国医疗AI实验室首席科学家，腾讯专家研究员，曾担任国内首个针对心衰患者的人工智能慢病管理平台（诺华腾讯合作）技术负责人。在生物信息和计算机药物辅助设计领域发表了70+篇SCI收录的文章，16项AI医疗领域的美国专利。是医疗AI的慢病管理领域的专家，也在生物计算和CADD的20年科研经历中，利用AI技术/统计学习/深度学习来解决生物信息学，化学信息学，多模态医疗信息学以及真实世界证据和数字化生物标记物的研究。目前总引用数11000余次，H-index 36。
14:00 - 14:20	Cancer driver gene discovery and carcinogenic mechanism research[Abstract] Cancers result from a set of genetic and epigenetic alterations. Most known oncogenes were identified by gain-of-function mutations in cancer, yet little is known about their epigenetic features. Through integrative analysis of 11,596 epigenomic profiles and mutations from >8,200 tumor-normal pairs, we discover broad genic repression domains (BGRD) on chromatin as an epigenetic signature for oncogenes. A BGRD is a widespread enrichment domain of the repressive histone modification H3K27me3. Shortening of BGRDs is linked to derepression of transcription. BGRDs at oncogenes tend to be conserved across normal cell types. Putative tumor-promoting genes and lncRNAs defined using BGRDs are experimentally verified as required for cancer phenotypes. Therefore, BGRDs play key roles in epigenetic regulation of cancer and provide a direction for mutation-independent discovery of oncogenes. Some candidate oncogenes identified by the BGRD method, such as BMI1 and CHD6, have been deeply studied to understand cancer driver mechanism. Dongyu Zhao, Peking University [Bio] 赵东宇, 北京大学基础医学院医学生物信息学系研究员，分子心血管学教育部重点实验室PI，国家级海外高层次青年人才入选者。中国抗癌协会肿瘤测序及大数据分析专委会委员，北京生物信息学研究会会员，Frontiers in Genetics期刊责任编辑。致力于疾病中细胞识别调控的遗传和表观遗传机制的计算模型研究。主持国家自然科学基金面上项目，参与科技部国家重点研发项目和北京大学临床医学+X青年专项。主要成果发表在Molecular Cell, Nature Communications, Oncogene等杂志。获得美国Moran Foundation学术奖励。
14:20 - 14:40	结直肠癌的性别差异[Abstract] Men demonstrate higher incidence and mortality rates of colorectal cancer (CRC) than women. This study aims to explain the potential causes of such sexual dimorphism in CRC from the perspective of sex-biased gut microbiota and metabolites. Sexual dimorphism in CRC tumorigenesis was detected using multiple CRC mouse models by examining tumor number, size, and intestinal barrier function. Changes in intestinal barrier function of pseudo germ-free mice receiving fecal samples from different sexes of CRC mice or patients were examined after fecal microbiota transplantation (FMT). Sex-biased gut microbiome of CRC mice or patients was assessed by shotgun metagenomic sequencing. Sex-biased gut metabolites were identified by liquid chromatography mass spectrometry. Sexual dimorphism in colorectal tumorigenesis was observed in both ApcMin/+ mice and AOM/DSS-treated mice with male mice had significantly larger and more tumors. The differences in colorectal tumor number and size between male and female was abolished after antibiotic treatment. Moreover, pseudo-germ mice receiving fecal samples from male mice or patients showed more severe intestinal barrier damage and higher level of inflammation. A significant change in gut microbiota composition was found with increased pathogenic bacteria Akkermansia muciniphila and depleted probiotic Parabacteroides goldsteinii, accompanied by impaired gut barrier function in both male mice and pseudo-germ mice receiving fecal sample from male mice. Sex-biased gut metabolites in pseudo-germ mice receiving fecal sample from CRC patients or CRC mice contributed to sex dimorphism in CRC tumorigenesis through glycerophospholipids metabolism pathway. Sexual dimorphism in tumorigenesis of CRC mouse models. The sex-biased gut microbiome and metabolites contribute to sexual dimorphism in CRC. Zhenxia Chen, Huazhong Agricultural University [Bio] 陈振夏，教授、博士生导师华中农业大学生物医学与健康学院教授、博士生导师，Briefings in Bioinformatics 副主编（Deputy Editor）。 2011年在北京大学获得生物信息学博士学位；2011-2016在美国国立卫生研究院从事博士后研究。现主要开展营养基因组学研究，采用“果蝇-小鼠-人”比较基因组学研究策略，通过动物和人类营养健康的多组学数据整合分析，评估“基因-膳食-表型”的关联及其分子机制。迄今以通讯或第一作者发表Advanced Science, Genome Research、 Nucleic Acids Research，Molecular Biology and Evolution，Briefings in Bioinformatics等SCI论文13篇。
14:40 - 15:00	Computational methods of drug-induced nephrotoxicity[Abstract] The kidney is a major target for xenobiotics, which include drugs, industrial chemicals, environmental toxicants and other compounds. Accurate methods for screening large numbers of potentially nephrotoxic xenobiotics with diverse chemical structures are currently not available. Here, we describe an approach for nephrotoxicity prediction that combines high-throughput imaging of cultured human renal proximal tubular cells (PTCs), quantitative phenotypic profiling, and machine learning methods. We automatically quantified 129 image-based phenotypic features, and identified chromatin and cytoskeletal features that can predict the human in vivo PTC toxicity of 44 reference compounds with ~82 % (primary PTCs) or 89 % (immortalized PTCs) test balanced accuracies. Surprisingly, our results also revealed that a DNA damage response is commonly induced by different PTC toxicants that have diverse chemical structures and injury mechanisms. Together, our results show that human nephrotoxicity can be predicted with high efficiency and accuracy by combining cell-based and computational methods that are suitable for automation. Ran Su, Tianjin University [Bio] 苏苒，主要研究方向为生物信息学、医学影像，曾获欧洲Lush科学奖奖励，获得国家自然科学基金优秀青年、天津市海外高层次人才等项目资助，担任多个期刊编委、副主编。
Session 4: AI & Spatial Omics, chaired by Zhang Zhang
15:00 - 15:20	通过对深度人工神经网络的可视化理解基因调控语法 [Abstract] 发现基因调控序列中的motif及其组合模式对于理解基因表达调控机制至关重要。尽管深度人工神经网络在顺式调控元件预测任务上取得了巨大成功，但如何利用这些模型帮助我们发现和解读motif及其组合模式仍然很困难。深度网络中的单个深层神经元往往能够被多种输入模式激活，导致我们难以理解其中的复杂规律。为了解决这个困难，我们针对深度卷积网络（CNN）中信号混叠的特点，提出了一种能够揭示深层神经元捕捉的序列模式的算法。给定网络中的任何卷积神经元（CN）， NeuronMotif首先通过重要性采样生成能够激活CN的大量序列样本，然后通过对所涉及的卷积层的特征图进行后向聚类，对序列模式进行“分层”，逐层解耦出各种蕴含的序列模式。NeuronMotif能够通过对深度神经网络的解读，发现大量motif，以及这些motif的组合模式，这些模式得到了文献和ATAC-seq印迹数据等的支持。同时，我们发现该方法也可以用于诊断深度网络中的病态神经元，从而提高模型的预测性能。 Xiaowo Wang, Tsinghua University [Bio] 汪小我清华大学自动化系教授、博士生导师。主要研究方向为模式识别与机器学习、生物信息学。在PNAS等期刊发表论文60余篇，被SCI他引五千余次。担任中国人工智能学会生物信息学与人工生命专委会副主任、中国生物工程学会青年工作委员会主任、中国计算机学会生物信息学专委会常委等。曾获全国优秀博士学位论文奖、中国自动化学会青年科学家奖，并获得国家杰出青年基金、国自然原创探索计划项目、优秀青年基金、教育部新世纪优秀人才计划等支持。
15:20 - 15:40	空间组学研究中的机遇与挑战[Abstract] 空间转录组技术能够在探测基因表达水平的同时获得其空间位置信息，被广泛应用于研究组织器官发育和疾病的发生发展过程。目前的空间转录组技术存在基因通量小或空间分辨率低等缺陷，使得研究者难以获得单个细胞的完整转录组信息。因此，研究者开发了许多整合空间转录组和单细胞转录组数据的算法工具。然而，尚无系统的独立研究来比较不同算法工具的性能。为了帮助研究者选择合适的算法工具，我们设计了一整套标准分析流程，从准确性、鲁棒性、计算资源等多个维度系统性评估了16种空间转录组和单细胞转录组数据整合算法在预测基因或细胞类型空间分布方面的性能。研究共收集了45对来源同一组织的空间转录组与单细胞转录组数据集以及32对模拟数据集。测试结果表明Tangram、gimVI和SpaGE在预测基因空间分布方面优于其他整合算法。此外，Cell2location、SpatialDWLS和RCTD在预测细胞类型空间分布方面更具优势。最后，我们总结了每种算法的属性、性能和适用性，并建立了直接的定量标准，为研究人员进一步提升算法性能提供了参考；并在github上提供了整合空间转录组和单细胞转录组数据的分析流程，为生物学家提供了明确的使用指南。 Kun Qu, University of Science and Technology of China [Bio] 瞿昆，中国科学技术大学国际合作与交流部部长兼港澳台办公室主任，中国科学技术大学生命科学与医学部教授。曾任美国斯坦福大学医学院生物信息研究员、高级研究员和中心主任。长期从事生物信息学、精准医疗、基因组学等领域的研究。致力于通过生命科学与医学、化学、数学和计算机科学的交叉融合，发展新型基因组学测序技术和生物大数据分析算法和软件；同时利用已有和原创的工具，系统性研究新冠肺炎、肿瘤、自身免疫病等重大疾病的免疫学致病机理。在Nature, Nature Methods, Cell, Cancer Cell等国际学术期刊上共发表论文60多篇。
15:40 - 16:00	Spatiotemporal 3D genome reorganization of multiple brain regions in human[Abstract] Elucidating regulatory mechanisms for the regionalization and the development of different brain regions in human is essential to understand human advanced cognition and mental disorders. However, it is poorly understood about spatiotemporal 3D chromatin structure reorganization and its regulatory functions in different brain regions. Here, we generated an atlas of high-resolution 3D chromatin structure in multiple human brain regions. Spatial and temporal 3D chromatin structure reorganization extensively participate in the brain-region-specific regulation. We also find that H3K27ac-marked super enhancers play important roles in shaping brain region-specific 3D chromatin structures. Finally, neuropsychiatric GWAS variants can link different target genes in a brain region-specific manner. Collectively, our findings provide important insights in 3D regulatory mechanisms for brain-region-specific functions, and can serve as a valuable resource for aiding the understanding of neuropsychiatric disorders and cognition. Xuepeng Chen, Guangzhou Lab [Bio] 陈雪鹏，广州实验室基础研究部研究员，广州市优秀专家，中国生物物理学会单细胞多组学分会委员。曾获得中国科学院院长特别奖，中国科学院优秀博士学位论文。陈雪鹏研究员重点围绕染色质高级结构的变化规律及调控机制开展研究。近五年主持国家自然科学基金面上项目1项，参与了国家自然科学基金重大研究计划、重点项目等多项国家任务，以第一作者或共同第一作者在Nature和Cell杂志上发表论文2 篇，其他参与工作发表在National Science Review、Cell Discovery、Genomics, Proteomics & Bioinformatics、Genes等杂志上。
16:00 - 16:20	Intelligent spatial transcriptomics: paving the way for deciphering tissue architecture[Abstract] Technological advances in spatial transcriptomics are critical for a better understanding of the structure and function of tissues in biological research. Recently, the combination of intelligent/statistical algorithms and spatial transcriptomics are emerging to pave the way for deciphering tissue architecture. In this talk, I will introduce our efforts to advance intelligent spatial transcriptomics. We first develop a graph attention auto-encoder framework STAGATE to accurately identify spatial domains by learning low-dimensional latent embeddings via integrating spatial information and gene expression profiles. We validate STAGATE on diverse spatial transcriptomics datasets generated by different platforms with different spatial resolutions. STAGATE could substantially improve the identification accuracy of spatial domains, and denoise the data while preserving spatial expression patterns. Importantly, STAGATE could be extended to multiple consecutive sections to reduce batch effects between sections and extracting three-dimensional (3D) expression domains from the reconstructed 3D tissue effectively. Based on this, we 1) develop STAMarker for identifying spatial domain-specific variable genes, 2) design STAligner for integrating spatial transcriptomics of multiple slices from diverse biological scenarios, and 3) illustrate the effectiveness of the graph attention auto-encoder for spatial clustering of spatial metabolomics. Shihua Zhang, Academy of Mathematics and Systems Science, CAS [Bio] 张世华，中国科学院数学与系统科学研究院研究员、中国科学院随机复杂结构与数据科学重点实验室副主任。主要从事生物信息计算、机器智能与优化交叉研究，主要成果发表在Cell、Nature Communications、Advanced Science等杂志。曾荣获中国青年科技奖、中国科学院卢嘉锡青年人才奖、中创软件人才奖，以及国家自然科学基金优秀青年基金、中国科学院卓越青年科学家项目、国家万人计划青年拔尖人才、中科院基础研究领域青年团队等资助。成果入选2021年度中国生物信息学十大进展、 2019年度中国生物信息学十大算法和工具。现任PLOS Computational Biology和BMC Genomics等杂志编委。
Session 5: Immunomics, chaired by Wenming Zhao
16:20 - 16:40	抗体抗原识别与计算[Abstract] 全球抗体药物研发迈入爆发期，如何研发智能计算技术加快抗体研发，成为生物计算领域新的挑战。围绕抗体抗原特异性相互作用，从识别、比较、应用三个层次，在功能性抗体计算领域建立了一系列高效预测算法与开源工具，助推抗体发现与转化应用。 Zhiwei Cao, Fudan University[Bio] 复旦大学特聘教授，计算生物学博士，先后任职上海生物信息技术研究中心与同济大学生命科学与技术学院，兼任2021-2025 科技部 “BT-IT融合”重点专项指南编制专家、实施方案论证专家。聚焦于抗体计算设计与多组分协同用药预测，发表Nature， Nucleic Acids Research， Molecular Biology & Evolution 等SCI文章130余篇。
16:40 - 17:00	Deep-learning based single-cell transcriptome and repertoire analysis[Abstract] T-cell receptors (TCRs) and B-cell receptors (BCRs) are critical in recognizing antigens and activating the adaptive immune response. Stochastic V(D)J recombination generates massive TCR/BCR repertoire diversity. Single-cell immune profiling with transcriptome analysis allow the high-throughput study of individual TCR/BCR clonotypes and functions under both normal and pathological settings. We presented the human Antigen Receptor database (huARdb), a large-scale human single-cell immune profiling database that contains 444 794 high confidence T or B cells (hcT/B cells) with full-length TCR/BCR sequence and transcriptomes from 215 datasets. All datasets were processed in a uniform workflow, including sequence alignment, cell subtype prediction, unsupervised cell clustering, and clonotype definition. We also developed a multi-functional and user-friendly web interface that provides interactive visualization modules for biologists to analyze the transcriptome and TCR/BCR features at the single-cell level. Furthermore, we developed deep learning based models to intergrate million-level single-cell immune profile data to uncover potential immunelogical mechanisms for transcriptome and TCR features, and TCR-p MHC prediction. Wanlu Liu, Zhejiang University [Bio] 刘琬璐，浙江大学爱丁堡大学联合学院研究员、助理教授、生物信息学方向博士生导师，目前实验室致力结合前沿交叉学科，通过生物信息、大数据、人工智能等手段解析适应性免疫中重要科学问题，在Cell, Nature Communications, Cell Stem Cell,Nucleic Acids Research, PNAS等国际权威期刊发表论文36篇，其中以第一及通讯作者（含共同）发表SCI论文12篇。近五年主持科研项目6项，包括国家自然科学基金面上项目、浙江省自然科学基金、阿里巴巴-浙江大学未来数字医疗联合研究中心青年科学家项目、腾讯AI Lab犀牛鸟专项研究计划等，授权国家计算机软件著作权2项。
17:00 - 17:20	Functional characterization of lncRNAs in cancer immunity by integration of omics data[Abstract] Recent advances in RNA deep sequencing have revealed numbers of noncoding RNAs (ncRNAs). These ncRNAs are usually classified into microRNAs and long noncoding RNAs (lncRNAs). Both expression and regulation perturbations of lncRNAs have been frequently found across various cancer types. However, functional characterization of lncRNAs in human complex diseases is still a challenging task. Taking advantage of the omics datasets, we have developed a number of computational methods to systematically predict the function of lncRNAs. First, we propose a resource LncSpA to explore tissue-elevated (TE) lncRNA across human normal and adult and pediatric cancer tissues. Notably, TE lncRNAs were found to be regulated by m6A modification across tissues, particular brain tissues. At regulatory level, we revealed that lncRNAs play critical roles in cancer by perturbing the transcription regulatory network. Recently, we systematically identified experimentally supported and predicted lncRNA peptides, and predicted tumour neoantigens from peptides encoded by lncRNAs, which would provide novel insights into cancer immunotherapy. Recent studies also highlighted the function of ncRNAs in immune cell differentiation and immune system function in cancer. Thus, we proposed ImmLnc to systematically identify the immune-related lncRNAs. We found that ImmLnc helps prioritize cancer-related lncRNAs and identifies cancer subtypes with different immunotype. Taken together, integrating the multi-omic data of expression and regulation, we generated biologically meaningful functional annotations for lncRNAs genome-wide. Our proposed computational models illustrate the power in functional prediction of lncRNAs, and opens up new avenues to study and functionally characterize lncRNAs. We anticipate that in the future, the integration of computational function prediction and more knockout or over-expression experiments will offer even deeper insight into the lncRNA functions. Juan Xu, Harbin Medical University [Bio] 徐娟教授，青年长江学者，副院长，省重点学科后备带头人。从事医学生物信息学研究，聚焦重大疾病中ncRN调控机制研究。先后主持国家自然科学基金4项，省杰出青年基金，校少帅项目等。以第一/通讯（含共同）在Nature Communication（ESI高被引）、 Nucleic Acids Research、Trends in Biochemical Sciences(封面论文)、Cancer Research、STTT等发表SCI IF>10的文章23篇，累计引用>3100。获霍英东青年教师、省青年科技人才等。主持/参加国家新工科项目1项、省教改课题4项，获省教学成果一等奖和二等奖，参编国家级规划教材《生物信息学》（第一、二、三版）、以第二主编编写英文专著及其他中英文论著10部。
17:20 - 17:40	Interdisciplinary approaches to study tumor immunity and cancer immunotherapy response[Abstract] One of the most pressing challenges in cancer research is to overcome suppressive TME and drive effective immune response. Our long-term career goal is to leverage machine learning models with immuno-oncology knowledge to identify resistance regulators and effective combinatorial therapeutic strategies in overcoming immunotherapy resistance. We have leveraged the fast-growing volume of public omics/clinical data to (1) identify novel regulators for immune resistance/response, (2) identify regulators of tumor-immune interactions in TME, and (3) develop useful tools/resources to aid the immuno-oncology community. Zexian Zeng, Peking University [Bio] 北京大学前沿交叉学科研究院定量生物学中心、北大-清华生命联合中心研究员。2019年博士毕业于美国西北大学；于美国哈佛大学、丹娜-法伯癌症中心Xiaole Shirley Liu教授实验室做博士后。在 Cell、Cancer Discovery、Science Advances、 Nucleic Acids Research、Cancer Immunology Research、Clinical Cancer Research、Genome Biology， JITC 等知名学术期刊上发表论文30余篇，其中第一作者文章23篇。博士论文被AMIA (American Medical Informatics Association)学会评为最佳博士论文奖。
17:40 - 18:20	Keynote talk: 中国妇科肿瘤数据中心建立及展望[Abstract] 以TCGA为代表的大型多中心多组学数据库衍生出海量的研究结果，多年来指导了临床实践的进步。但是这些数据来源于欧美人群。目前亟需描绘出中国人群的肿瘤基因组图谱。妇科肿瘤多组学数据中心（Chinese Gynecologic Oncology multi-Omics Data Center， CGODC）是目前中国最大、最全面的妇科肿瘤基因组学数据中心。该项目对 700 多例原发肿瘤和匹配的正常样本以及转移淋巴结样本进行了分子表征。并纳入其他肿瘤的多组学数据，总数超过千例。多年来，CGODC 在单细胞水平和大块组织水平上生成了基因组、表观基因组和转录组等组学数据，为补充和比较以 TCGA 为代表的肿瘤基因组学研究提供了宝贵的资源和机会。本次会议也将介绍孟元光教授团队在中国妇科肿瘤数据中心建立所做的一些工作，以及对组学数据和临床相结合的展望。 Yuanguang Meng, PLA General Hospital [Bio] 孟元光中国人民解放军总医院妇产医学部主任主任医师、教授、博士生导师兼任南开大学硕、博士研究生导师中国妇幼保健协会妇科智能（AI）医学专业委员会主任委员国家远程医疗与互联网医学中心妇科肿瘤专家委员会主任委员中华医学会妇产科分会委员中华医学会妇科肿瘤学分会委员中国医师协会妇产科专业委员会常委中国医师协会外科医用机器人协会常委全军第十届医学科委会妇产科专业委员会副主任委员 2014年科学中国人年度人物 2018年获第二届“国之名医●优秀风范奖” 担任多家杂志的常务编委和编委

The 7th Big Data Forum for Life and Health Sciences