The 5th Big Data Forum for Life and Health Sciences (October 15, 2020)

Biological research has entered the era of big data, including a wide variety of omics data and covering a broad range of health data. Such big data is generated at ever-growing rates and distributed throughout the world with heterogeneous standards and diverse limited access capabilities. However, the promise to translate these big data into big knowledge can be realized only if they are publicly shared. Thus, providing open access to omics & health big data is essential for expedited translation of big data into big knowledge and is becoming increasingly vital in advancing scientific research and promoting human healthcare and precise medical treatment.

腾讯会议ID: 980323108

腾讯会议链接: https://meeting.tencent.com/s/bFmF92QlDLpu

腾讯直播间: https://meeting.tencent.com/l/Tcz5EefvQ8lD

It is our great pleasure to announce that the 2020 Big Data Forum for Life and Health Sciences will be held on October 15, 2020. A few renowned biomedical data scientists have agreed to give speeches. Likely, you are also cordially invited to share your work and participate in this exciting event.

Organizing Committee

  • Zhang Zhang (Chair, BIG, CAS)
  • Yiming Bao (BIG, CAS)
  • Wenming Zhao (BIG, CAS)
  • Jingfa Xiao (BIG, CAS)
  • Songnian Hu (Institute of Microbiology, CAS)
  • Jun Yu (BIG, CAS)
  • Jingchu Luo (Peking University)

Previous Conferences

Invited Speakers

Yiming Bao

Professor
National Genomics Data Center
Beijing Institute of Genomics, CAS / China National Center for Bioinformation (CNCB)
China

Qiaomei Fu

Professor
Laboratory on Molecular Paleontology of IVPP
Institute of Vertebrate Paleontology and Paleoanthropology, CAS
China

Lin Gao

Professor
Department of Computational Bioinformatics
Xidian University
China

Shunmin He

Professor
Center for Big Data Research in Health
Institute of Biophysics, CAS
China

Tingting Li

Professor
School of Basic Medical Sciences
Peking University
China

Mingkun Li

Professor
Key Laboratory of Precision Genomic Medicine
Beijing Institute of Genomics, CAS / China National Center for Bioinformation (CNCB)
China

Zhaoqi Liu

Professor
Key Laboratory of Precision Genomic Medicine
Beijing Institute of Genomics, CAS / China National Center for Bioinformation (CNCB)
China

Xin Liu

Data Coordination and Archiving team
The European Bioinformatics Institute
UK

Shuhui Song

Associate Professor
National Genomics Data Center
Beijing Institute of Genomics, CAS / China National Center for Bioinformation (CNCB)
China

Chaolong Wang

Professor
Department of Epidemiology and Biostatistics
Huazhong University of Science and Technology
China

Jing Wang

Professor
Key Laboratory of Mental Health
Institute of Phychology, CAS
China

Hao Wu

Sahlgrenska Hospital
University of Gothenburg
Sweden

Yi Xing

Professor
Center for Computational and Genomic Medicine, The Children's Hospital of Philadelphia
Department of Pathology and Laboratory Medicine, University of Pennsylvania
USA

Ying Xu

Professor
Department of Biochemistry and Molecular Biology, and Institute of Bioinformatics
University of Georgia
USA

Yu Xue

Professor
College of Life Science and Technology
Huazhong University of Science and Technology
China

Li Yang

Professor
Key Laboratory of Computational Biology
CAS-MPG Partner Institute for Computational Biology, Shanghai Institute of Nutrition and Health, CAS
China

Peng Yu

Professor
West China Hospital, Sichuan University
China

Zhihua Zhang

Professor
Key Laboratory of Genome Science and Information
Beijing Institute of Genomics, CAS / China National Center for Bioinformation (CNCB)
China

Agenda (Online Meeting)

October 15 Thursday
08:50 - 09:00 Welcome and Opening Remarks
09:00 - 10:30 Session 1: Omics Data & Human Diseases, chaired by Zhang Zhang
09:00 - 09:30 Keynote talk 1: Metabolic Reprogramming in Cancer: the bridge that connects intracellular stress and cancer behaviors
Ying Xu, University of Georgia, USA
[Abstract]

Cancer has been considered as a genomic disease, which has served as the guiding principle in cancer research and the basis for cancer diagnosis and treatment. However, increasingly more researchers have challenged this viewpoint in the past decade since it could not answer too many cancer related questions! We have been developing a cancer evolutionary theory in the past few years. The key idea is: persistent inflammation of certain types will lead to increased local H2O2 and iron concentrations, which together will give rise to Fenton reaction: Fe2+ + H2O2 -> Fe3+ + ∙OH + OH-. If the environment is also rich in O_2^(∙-), which is predominantly released from neutrophils in cancer tissues, O_2^(∙-) can reduce Fe3+ back to Fe2+, hence driving the reaction to go on as long as O_2^(∙-) is available.

We have discovered that (1) all cancer tissues in TCGA have persistent Fenton reactions in their cytosol and mitochondria, and (2) the rates of cytosolic Fenton reactions will saturate the pH buffer quickly, hence driving the cytosolic pH up if not neutralized. Our next key finding is that the affected cells utilize a wide range of metabolic reprogramming (MR) to produce more protons to keep the Fenton reaction-produced OH- neutralized. We have studied some 50 MRs in 14 cancer types, which each produce more protons compared to the original metabolism. Further analyses suggest that the affected cells use cell division as way to rid of the persistently produced nucleotides. I will explain how other clinical behaviors of cancer may be driven by other reprogrammed metabolisms, mainly to remove their end- or intermediate products so the proton-producing MRs can continue and keep the affected cell alive.

09:30 - 09:50 Mental disorder study in the big data era
Jing Wang, Institute of Psychology, Chinese Academy of Sciences
[Abstract]

21世纪是大数据的时代,伴随着高通量测序等技术的迅速发展,生物和医学大数据正在急剧积累,如何从这些数据中提炼 出有用的信息与知识,是当前科研工作者,包括临床专家共同关注的话题。以几种常见高发心理疾患作为研究对象,我们 建立了一套独具特色的心理疾患数据整合与挖掘策略,从不同角度、不同层面对疾病组学数据进行整合,在整合的基础上 ,通过生物信息分析进行数据挖掘,以期揭示疾病可能的候选分子标志物与机制。同时,我们开发了一系列工具,包括全 基因组关联学习(GWAS)数据的通路分析工具、遗传数据调控功能分析工具等,从不同维度研究候选致病位点与可能的 机制。上述系列数据库和工具的累计访问用户逾25万人,累计页面访问量逾5000万次。面向未来,充分利用大数据带给 我们的机遇与挑战,采用疾病研究的“一体化”策略,开展多维度数据整合与数据挖掘,将有助于我们更好地揭示疾病的机 制,实现传统医学模式向精准医学的转变。

09:50 - 10:10 Computational biological hypothesis generation using omics data
Peng Yu, West China Hospital of Sichuan University
10:10 - 10:30 Mutations in the RNA Splicing Factor SF3B1 Promote Tumorigenesis
Zhaoqi Liu, Beijing Institute of Genomics, CAS / China National Center for Bioinformation (CNCB)
10:30 - 12:10 Session 2: Data Integration & Deep Mining, chaired by Jingfa Xiao
10:30 - 11:00 Keynote talk 2: Multi-Omics Integration for Cancer Related Pattern Discovery
Lin Gao, Xidian University
[Abstract]

The mechanism, diagnosis and prognosis of cancer is one of the core researches problem in life science and related multidisciplinary domain. The challenge is that the progression process of a cancer is a highly dimensional, time varying, and dynamic system. How do we discover cancer-causing patterns, cancer subtyping and finally associate these patterns with cancer initiation, progression andtherapy. With increasing amounts of multi-omics/single-cell multi- omics data becoming available, we can construct the computational model of those kinds of data by different kinds of model.The system biology and complex network provide new insight for cancer. In this talk, I will investigate network models for different patterns for cancer with multi-omics data integration. The key methodological challenges face in computational disease modeling.

11:00 - 11:20 Methods to characterize chromatin domains using ultra-low resolution Hi-C data
Zhihua Zhang, Beijing Institute of Genomics, CAS / China National Center for Bioinformation (CNCB)
11:20 - 11:40 Repeat-derived RNAs help maintain heterochromatin
Shunmin He, Institute of Biophysics, Chinese Academy of Sciences
[Abstract]

Retrotransposons are populated in vertebrate genomes, which, when active, are thought to cause genome instability with potential benefit to genome evolution. Retrotransposon-derived RNAs are also known to give rise to small endo-siRNAs to help maintain heterochromatin at their sites of transcription; however, as not all heterochromatic regions are equally active in transcription, it remains unclear how heterochromatin is maintained across the genome. Here, we address these problems by defining the origins of repeat-derived RNAs and their specific chromatin registers in Drosophila S2 cells. We demonstrate that repeat RNAs are predominantly derived from active gypsy elements and processed by Dcr-2 into small RNAs to help maintain pericentromeric heterochromatin. We also show in cultured S2 cells that synthetic repeat-derived endo-siRNA mimics are sufficient to rescue Dcr-2 deficiency-induced defects in heterochromatin formation in interphase and chromosome segregation during mitosis, demonstrating that active retrotransposons are required for stable genetic inheritance.

11:40 - 12:10 Keynote talk 3: Pathway-guided analysis of alternative splicing during cancer progression
Yi Xing, The Children’s Hospital of Philadelphia & University of Pennsylvania, USA
[Abstract]

Aberrant pre-mRNA alternative splicing (AS) is widespread in cancer, but the causes and consequences of AS dysregulation during cancer progression are not well understood. We developed a novel computational framework, PEGASAS, as a pathway-guided approach for examining the effects of oncogenic signaling on exon incorporation. PEGASAS was designed to study the interplay among oncogenic signaling, AS, and affected biological processes. In this study, we applied PEGASAS to define the AS landscape across prostate cancer disease states and the relationship between splicing and known driver alterations. We compiled a meta-dataset of RNA-seq data of 876 tissue samples from publicly available sources, covering a range of disease states, from normal tissues to aggressive metastatic tumors. PEGASAS analysis revealed a correlation between Myc signaling and splicing changes in RNA binding proteins (RBPs), suggestive of a previously undescribed auto-regulatory phenomenon. We experimentally verified this result in a human prostate cell transformation assay. Our findings establish a role for Myc in regulating RNA processing by controlling incorporation of nonsense mediated decay determinant exons in RBP-encoding genes. In conclusion, PEGASAS can mine large-scale transcriptomic data to connect changes in pre-mRNA AS with oncogenic alterations that are common to many cancer types.

12:10 - 13:30 Break
13:30 - 15:20 Session 3: Human Population Genomics & Public Health, chaired by Yiming Bao
13:30 - 14:00 Keynote talk 4: 古基因组探究东亚人群史前人群演化
Qiaomei Fu, Institute of Vertebrate Paleontology and Paleoanthropology, Chinese Academy of Sciences
[Abstract]

不同时间和地点的人类古基因组数据的涌现,带来大时空框架下研究人类遗传演化的可能。尤其是古基因组捕获技术的应用与 发展,为获取东亚史前南方人群基因组开辟道路,由此为东亚早期南方人群研究带来新的见解。本报告将着重阐述古基因组所 揭示的东亚史前人群演化历史。从旧石器时代的东亚早期现代人“田园洞人”基因组至新石器时代以来东亚南北方人群基因组的 研究,阐明东亚史前不同现代人群的遗传特点及与其他欧亚人群的遗传联系,揭示新石器时代以来东亚南北方人群的遗传差异 及迁徙融合过程,明确南岛语族的起源。这些研究反映了东亚史前人群的多样性及遗传历史的复杂性,凸显出人群迁徙与基因 流动在东亚现今人群结构的形成和发展中所发挥的重要作用。

14:00 - 14:20 Admixture history of Singapore Peranakan Chinese revealed by whole genome sequencing analysis 
Chaolong Wang, Huazhong University of Science and Technology
[Abstract]

Peranakan Chinese, who are descendants of Chinese immigrants settled in the Malay Archipelago ~300-500 years ago, have developed their unique culture that preserves Chinese traditions with a strong influence from local Malays. Yet, whether genetic admixture co-occurred with the cultural mixture has been an ongoing debate historically. We performed whole genome sequencing (WGS) on 177 Singapore Peranakans and 28 indigenous Malays from Indonesia, and analyzed jointly with WGS data from the SG10K Project and the 1000 Genomes Project. We estimated that Peranakan Chinese inherited ~5.62% (95% confidence interval [CI]: 4.75-6.46%) Malay ancestry, much higher than that in the general Singapore (SG) Chinese (1.08%, 0.69-1.53%), southern Chinese (0.86%, 0.57-1.31%), and northern Chinese (0.25%, 0.18-0.33%). A sex-biased admixture history, in which the Malay ancestry was contributed primarily by females, was supported by analyses of the X chromosome, and mitochondrial and Y haplogroups. Finally, we identified an ancient admixture event shared by Peranakan Chinese and SG Chinese at ~1,612 (95% CI: 1,345-1,923) years ago, coinciding with the settlement history of Han Chinese in southern China, and a recent admixture event unique to Peranakan Chinese at ~190 (159-213) years ago. Our results support the hypothesis that genetic admixture co-occurred with cultural mixture in forming the Peranakan Chinese community and uncovered historical admixture events in southern Chinese.

14:20 - 14:40 新冠病毒基因组信息资源整合与序列变异分析
Shuhui Song, Beijing Institute of Genomics, CAS / China National Center for Bioinformation (CNCB)
14:40 - 15:00 The microbiota of respiratory tract: progress, challenge, and perspective
Mingkun Li, Beijing Institute of Genomics, CAS / China National Center for Bioinformation (CNCB)
[Abstract]

Investigation of the respiratory tract microbiota is a relatively young field; however, there has been remarkable progress in understanding the composition and function of the respiratory tract microbiota in the past few years. Alterations of the respiratory tract microbiota have been observed in many respiratory diseases, including chronic obstructive pulmonary disease (COPD), asthma, and cystic fibrosis, but underlying mechanisms and interactions with host genes are largely unknown. Meanwhile, technologies developed for respiratory tract microbiota have the potential to identify the pathogen that causes an infection in the respiratory tract. For instance, SARS-CoV-2 was first identified in the metatranscriptome data of the bronchoalveolar lavage fluid (BALF).

Our lab has conducted respiratory tract microbiota analysis on over 2000 samples, including the oropharyngeal swab, sputum, and BALF which were collected from pneumonia, COPD, COVID-19 patients, and healthy controls, to disentangle the association between the respiratory tract microbiota and disease progression. Meanwhile, we were also working on the development and optimization of the methods and protocols to manipulate different types of specimens as well as new algorithms to analyze the data.

15:00 - 15:20 Roles of gut microbes in the pathogenesis and development of obesity, NAFLD, and diabetes: a systems perspective 
Hao Wu, University of Gothenburg Sahlgrenska Hospital, Sweden
[Abstract]

The human gut microbiota encompasses a densely populated ecosystem that provides essential functions for host development, immune maturation, and metabolism. Alterations to the gut microbiota have been observed in numerous diseases, including human metabolic diseases such as obesity, non-alcoholic fatty liver disease (NAFLD) and type 2 diabetes (T2D). However, few studies have validated causality in humans and the underlying mechanisms remain largely to be elucidated. We discuss how systems biology approaches combined with new experimental technologies may disentangle some of the mechanistic details in the complex interactions of diet, microbiota, and host metabolism and may provide testable hypotheses for advancing our current understanding of human–microbiota interaction.

15:20 - 17:20 Session 4: Big Data Resources & Clinical Bioinformatics, chaired by Shuhui Song
15:20 - 15:50 Keynote talk 5: 国家生物信息中心数据资源
Yiming Bao, Beijing Institute of Genomics, CAS / China National Center for Bioinformation (CNCB)
[Abstract]

Genome data are increasing dramatically as the result of new technologies. Often, these data are required to be deposited into international databases such as DDBJ, EBI and NCBI, in order to obtain accession numbers needed for publication. This could be challenging sometimes for researchers in China because of large data size, slow data transfer due to limited international internet bandwidth, and language barrier and technical issues in communication. To alleviate these problems, the BIG Data Center (BIGD, https://bigd.big.ac.cn) was launched in 2016 at Beijing Institute of Genomics (BIG), Chinese Academy of Sciences (CAS). During the past few years, BIGD has grown and expanded considerably and became one of the major global centers. In 2019, the National Genomics Data Center (NGDC) was created based on BIGD. Later in the same year, BIG was given the title of China National Center for Bioinformation (CNCB). CNCB will be built on the well-established NGDC multi-omics databases such as Genome Sequence Archive (GSA), Genome Variation Map (GVM), Genome Warehouse (GWH) and 2019 Novel Coronavirus Resource (2019nCoVR), together with specialized resources from many institutions under CAS and other ministries. CNCB is dedicated to providing freely accessible data repositories and a variety of data resources in support of worldwide research activities.

15:50 - 16:10 Proteome-scale analysis of phase-separated proteins in immunofluorescence images
Tingting Li, Peking University School of Basic Medical Sciences
16:10 - 16:30 iCTCF: an integrative resource of chest computed tomography images and clinical features of patients with COVID-19 pneumonia
Yu Xue, Huazhong University of Science and Technology
[Abstract]

The outbreak of coronavirus disease 2019 (COVID-19) caused by severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) has been initially reported in Wuhan, China since December, 2019. Here, we report a timely and comprehensive resource named integrative computed tomography (CT) images and clinical features (CFs) for COVID-19 (iCTCF) to archive chest CT images, 130 types of CFs, and laboratory-confirmed SARS-CoV-2 clinical status from 1521 patients with or without COVID-19 pneumonia, reaching a data volume of 265.1 GB. To facilitate COVID-19 diagnosis, we integrate the heterogeneous CT and CF datasets, and develop an engineering framework of Hybrid-learning for UnbiaSed predicTion of COVID-19 patients (HUST-19) to predict morbidity and mortality outcomes. From the results, we find the integration of CT and CF datasets achieves a striking accuracy with area under the curve (AUC) values of 0.921, 0.931 and 0.856 for predicting mild/regular, severe/critically ill, and deceased cases, much higher than that when exclusively using either CT or CF data. Together with HUST-19, iCTCF can serve as a fundamental resource for improving the diagnosis and management of COVID-19 patients.

16:30 - 16:50 ENA数据库介绍
Xin Liu, European Bioinformatics Institute, UK
[Abstract]

ENA数据结构,数据存取的挑战,数据库总框架,提取数据的流程

16:50 - 17:20 Keynote talk 6: From promiscuous nucleotide modification to precise genome editing
Li Yang, CAS-MPG Partner Institute for Computational Biology, Shanghai Institute of Nutrition and Health, Chinese Academy of Sciences
[Abstract]

A series of deaminase enzymes, including both cytidine (a.k.a APOBECs/AIDs) and adenosine (a.k.a. ADARs) deaminases, catalyze cytidine-to-uridine (C-to-U) or adenosine-to-inosine (A-to-I) base modification in RNA. Interestingly, cytidine deaminases can also catalyze cytidine(s) in single stranded regions of genomic DNA, resulting in C-to-U base substitutions and eventually C-to-T mutations in genome. Thousands of promiscuous C-to-T mutations in cancer genome have been suggested to be associated with cytidine deaminases. Strikingly, the combination of deaminase enzymes with CRISPR/Cas9 protein achieves targeted base editing at single nucleotide resolution in genome, referred to as base editor (BE) system. Recently, we and other have developed a series of novel BEs, including but not limited to dCpf1-BEs that conjugate catalytically dead Cpf1 with APOBEC and hA3A-BEs that conjugate human APOBEC3A with nCas9. These newly developed BEs not only expand editing scopes, but also shed new light on their potential applications in biomedical research, biotechnology and therapeutics with high precision. Here, I will summarize the recent progress of genome editing systems from a view of single base resolution, highlighting their advances and discussing distinct mechanisms of off-target effects for future improvement.