The 6th Big Data Forum for Life and Health Sciences (Oct 15, 2021)

Biological research has entered the era of big data, including a wide variety of omics data and covering a broad range of health data. Such big data is generated at ever-growing rates and distributed throughout the world with heterogeneous standards and diverse limited access capabilities. However, the promise to translate these big data into big knowledge can be realized only if they are publicly shared. Thus, providing open access to omics & health big data is essential for expedited translation of big data into big knowledge and is becoming increasingly vital in advancing scientific research and promoting human healthcare and precise medical treatment.
It is our great pleasure to announce that the 2021 Big Data Forum for Life and Health Sciences will be held on October 15, 2021. A few renowned biomedical data scientists have agreed to give speeches. Likely, you are also cordially invited to share your work and participate in this exciting event.

会议主题: The 6th Big Data Forum for Life and Health Science

会议时间: 2021/10/15 08:45-18:00 (GMT+08:00)

腾讯会议ID: 262 365 725

会议链接:https://meeting.tencent.com/dm/IiGIO9weaIik

腾讯直播间: https://meeting.tencent.com/live/14884343658116241551

Organizing Committee

Zhang Zhang (Chair, BIG, CAS)
Yiming Bao (BIG, CAS)
Wenming Zhao (BIG, CAS)
Jingfa Xiao (BIG, CAS)
Songnian Hu (Institute of Microbiology, CAS)
Jun Yu (BIG, CAS)
Jingchu Luo (Peking University)

Previous Conferences

Invited Speakers

Professor
National Genomics Data Center, Beijing Institute of Genomics, CAS
China National Center for Bioinformation
China
Professor
Yunnan University
China
Professor
Peking University
China
PhD Candidate
National Genomics Data Center, Beijing Institute of Genomics, CAS
China National Center for Bioinformation
China
Associate Professor
Center for Epigenetics & Disease Prevention Institute of Biosciences and Technology, Texas A&M University
United States
Professor
Peking University
China
Professor
Beijing Institute of Genomics, CAS
China National Center for Bioinformation
China
Senior Engineer
Peking University
China
Assistant Professor
Biomedical Informatics, Harvard Medical School
United States
Assistant Professor
Center for Genetic Medicine Research, Children's National Hospital, Department of Genomics and Precision Medicine, George Washington University
United States
Professor
School of Biological Science & Medical Engineering, Southeast University
China
Associate Professor
National Genomics Data Center, Beijing Institute of Genomics, CAS
China National Center for Bioinformation
China
Professor
Center for Novel Target and Therapeutic Intervention, Institute of Life Sciences, Chongqing Medical University
China
Professor
Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences
China
Professor
Zhejiang University Medical Center
China
PhD Candidate
National Genomics Data Center, Beijing Institute of Genomics, CAS
China National Center for Bioinformation
China
Professor
Institute of Apicultural Research, Chinese Academy of Agricultural Sciences
China
Associate Professor
Jilin University
China
PhD Candidate
National Genomics Data Center, Beijing Institute of Genomics, CAS
China National Center for Bioinformation
China
Professor
College of Bioinformatics Science and Technology, Harbin Medical University
China
Professor
Xi'an Jiaotong University
China
Professor
Shanghai Institute of Immunology, Department of Immunology and Microbiology, Shanghai JiaoTong University School of Medicine
China
PhD Candidate
National Genomics Data Center, Beijing Institute of Genomics, CAS
China National Center for Bioinformation
China

Agenda (Online Meeting)

October 15, Friday
08:50 - 09:00 Welcome and Opening Remarks
Session 1: Bioinformatics Algorithms & Tools, chaired by Zhang Zhang
09:00 - 09:30 Keynote talk: Haplotype-resolved assembly of accurate long reads
Heng Li, Biomedical Informatics, Harvard Medical School
09:30 - 09:50 Decoding tissue- and cell-type specificity of human complex traits and diseases
Peilin Jia, Beijing Institute of Genomics, CAS
China National Center for Bioinformation
[Abstract]

Assessing the relevant tissues and cell types of human traits and diseases is important for better interpreting trait-associated genetic variants, understanding disease etiology, and improving treatment strategies. In this presentation, we will introduce our recent work to decode tissue- and cell-type specificity of thousands of human complex traits and diseases. We developed deTS, an R package for tissue-specific enrichment analysis (TSEA), and constructed TSEA-DB with trait-tissue associations for thousands of GWAS summary statistics data (~4500 UKBB data sets and ~600 from other resources) for a wide range of human traits and diseases.

09:50 - 10:10 KOBAS-i: intelligent prioritization and exploratory visualization of biological functions for gene enrichment analysis.
Lei Kong, Peking University
10:10 - 10:30 BSAlign: a library for DNA sequence alignment
Jue Ruan, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences
Session 2: Big Data & Cancer Omics, chaired by Peilin Jia
10:30 - 10:50 Big Data and Gene Editing Approaches to Understand Human Genome
Wei Li, Center for Genetic Medicine Research, Children's National Hospital, Department of Genomics and Precision Medicine, George Washington University
[Abstract]

Various CRISPR-based gene editing tools generated a large amount of data to better understand how gene editing works and how human genome functions. Here we introduce our recent work to use both approaches to model CRISPR-Cas13d RNA editing system and to identify functional lncRNAs in cell proliferation. Using data from CRISPR-Cas13d screens, we designed a deep learning model, named DeepCas13, to predict the on-target activity of a gRNA with high accuracy from its sequence and secondary structure. DeepCas13 outperforms existing methods and accurately predicts the efficiency of guides targeting both protein-coding and non-coding RNAs (e.g., circRNAs and lncRNAs). Next, we systematically studied guides targeting non-essential genes, and found that the off-target viability effect, defined as the unintended effect of guides on cell viability, is closely related to their on-target RNA cleavage efficiency. Finally, we applied these models to our screens that included guides targeting 234 lncRNAs, and identified lncRNAs that affect cell viability and proliferation in multiple cell lines.

10:50 - 11:10 Identification, characterization and curation of long non-coding RNAs
Lina Ma, National Genomics Data Center, Beijing Institute of Genomics, CAS
China National Center for Bioinformation
[Abstract]

The human genome transcribes a large number of long non-coding RNAs (lncRNAs), while only a small part of them have been experimentally studied, posing great challenges for the annotation of human genome. Here, with the standard workflow, stringent criteria, and the robust lncRNA identification algorithm (LGC), we integrated and curated lncRNAs identified by different resources, to provide a comprehensive and high-quality reference for human lncRNA. Based on this reference (LncBook, LncExpDB), we performed multi-omics integrative analysis to predict potential featured lncRNAs and those associated with diseases, which reveals that about 50% of human lncRNAs tend to be functional. To help researchers to select appropriate candidates for investigation and guide experimental design, we characterized lncRNAs’ expression profiles across diverse biological contexts/conditions, evaluated their expression capacities, and predicted interacting mRNAs. At the same time, we developed tools to predict biological functions for lncRNAs in LncRNAWiki 2.0. The extensively studied lncRNAs tend to be involved in many other important biological processes, and they deserve systematic studies to deepen our understanding in lncRNAs.

11:10 - 11:30 From cellular infiltration assessment to a functional gene set-based prognostic model for breast cancer
Hongde Liu, School of Biological Science & Medical Engineering, Southeast University
[Abstract]

We have developed a highly reliable BC-RGEP that adequately annotates different cell types and estimates the cellular infiltration. Of importance, the functional gene set-based prognostic model that we have introduced here showed a great ability to screen patients based on their therapeutic response. On a broader perspective, we provide a perspective to generate similar models in other cancer types to identify shared factors that drives cancer heterogeneity.

11:30 - 12:00 Keynote talk: Maximizing the utility of cancer omics data
Leng Han, Center for Epigenetics & Disease Prevention Institute of Biosciences and Technology, Texas A&M University
[Abstract]

Despite advancements in treatment options for cancer, a majority of cancer types continue to lack fully characterized and effective targeted therapies to improve disease diagnostics, prognoses, and patient survival outcomes. Therefore, there is an urgent need to gain a more comprehensive understanding of the molecular basis of diseases and develop novel prognostic and therapeutic strategies. Our lab utilizes cutting-edge techniques in systems biology to understand the molecular mechanisms of complex diseases. We have comprehensive understanding of the molecular mechanisms of novel transcriptomic elements in cancer (Trends in Cancer, 2018), including pseudogenes (Nature Communications, 2014), lncRNA (Cancer Research, 2015), RNA editing (Cancer Cell, 2015), eQTL (Nucleic Acids Research, 2018), snoRNA (Cell Reports, 2017), APA (Journal of the National Cancer Institute, 2018), circRNA (Genome Medicine, 2019) and eRNA (Nature Communications, 2019). We pioneered a series of pan-cancer analyses to provide clinical insights into cancer therapy, including chronotherapy (Cell Systems, 2018), hypoxia-targeted therapy (Nature Metabolism, 2019), and immunotherapy (Nature Immunology, 2019; Nature Communications, 2020a; Nature Communications, 2020b; Genome Medicine, 2020a; JNCI, 2021). These studies shed light on future clinical considerations for the development of innovative therapies for cancer types currently lacking effective treatment options.

Session 3: Biodiversity & Database Resources, chaired by Shuhui Song
13:00 - 13:30 Keynote talk: Three chromosome-scale Papaver genomes reveal the evolutionary history of morphinan biosynthetic pathway
Kai Ye, Xi'an Jiaotong University
[Abstract]

Morphinans are benzylisoquinoline alkaloids produced by opium poppy (Papaver somniferum L.) plants, the sole natural source of commercial opioids today. How morphinans biosynthetic pathway evolved in Papaver spp. remains a mistery. We previously reported the P. somniferum genome and discovered genes encoding key enzymes for the morphinan pathway are encapsulated within a gene cluster. Here, we produced chromosome-scale assemblies of two additional Papaver genomes, P. setigerum and P. rhoeas, and compared them to a Hi-C improved assembly of P. somniferum HN1 genome. The whole genome duplication (WGD) events not only permit functional innovation but also log the footsteps, which reveal a series of structural variation events converging key genes in morphinan biosynthetic pathway to the same locus, enabling concerted production of novel metabolic compounds.

13:30 - 13:50 Genetic basis of heterosis in hybrid rice related to rice divergence
Hang He, Peking University
13:50 - 14:10 Database Resources of CNCB-NGDC
Yiming Bao, National Genomics Data Center, Beijing Institute of Genomics, CAS
China National Center for Bioinformation
14:10 - 14:40 Keynote talk: Genetic basis of bumblebee diversification revealed by genus-wide genomic resources
Cheng Sun, Institute of Apicultural Research, Chinese Academy of Agricultural Sciences
[Abstract]

Bumblebees are a diverse group of globally important pollinators in natural ecosystems and for agricultural food production. With both eusocial and solitary life-cycle phases, and some social parasite species, they are especially interesting models to understand social evolution, behavior, and ecology. Bumblebees display considerable interspecific diversity in morphology, color patterning, food preference, pathogen incidence, decline status and exhibit diverse life histories and ecologies. However, little is known about the underlying genetic basis that gives rise to these diverse phenotypes, including their differential responses to changing environments. To broadly sample bumblebee genomic and phenotypic diversity, we de novo sequenced and assembled the genomes of 17 species, representing all 15 subgenera, producing the first genus-wide quantification of genetic and genomic variation potentially underlying key ecological and behavioral traits. Chromosome-level assemblies show a stable 18-chromosome karyotype, with major rearrangements creating 25 chromosomes in social parasites. Differential transposable element activity drives changes in genome sizes, with putative domestications of repetitive sequences influencing gene coding and regulatory potential. Dynamically evolving gene families and signatures of positive selection point to genus-wide variation in processes linked to foraging, diet and metabolism, immunity and detoxification, as well as adaptations for life at high altitudes. Our study reveals how bumblebee genes and genomes have evolved across the Bombus phylogeny and identifies variations potentially linked to key ecological and behavioral traits of these important pollinators.

Session 4: Precision Medicine & Clinical Bioinformatics, chaired by Lina Ma
14:40 - 15:00 Influence of mutation clonality heterogeneity in clinical outcome of glioma
Yun Xiao, College of Bioinformatics Science and Technology, Harbin Medical University
[Abstract]

Genomic studies have revealed that genomic aberrations play important roles in the progression of cancer. In this study, we used an integrated framework to infer the timing and clonal status of mutations in ~600 diffuse gliomas from The Cancer Genome Atlas (TCGA) including glioblastomas (GBMs) and low-grade gliomas (LGGs). Glioma showed widespread genetic intratumoural heterogeneity (ITH), with nearly all driver genes harbouring subclonal mutations, even for known glioma initiating event IDH1 (17.1%). Gliomas with subclonal IDH mutation and without 1p/19q codeletion showed shorter overall and disease specific survival, higher ITH, and exhibited differences in genomic patterns, transcript levels and proliferative potential, when compared with IDH clonal mutation and no 1p/19q codeletion gliomas. We further found a higher subclonal mutation burden in females than males in the majority of glioma subtypes. Moreover, analysis of clinically actionable genes revealed that mutations in genes of the mitogen-activated protein kinase (MAPK) signaling pathway were more likely to be clonal in female patients with GBM, whereas mutations in genes involved in the receptor tyrosine kinase signaling pathway were more likely to be clonal in male patients with LGG. Finally, we defined a refined stratification system based on the current WHO glioma molecular classification, which showed close correlations with patients’ clinical outcomes. In conclusion, we integrated the clonal status of somatic mutations into cancer genomic classification and highlighted the necessity of considering clonal architectures in glioma precision stratification.

15:00 - 15:20 Identification of neuropsychiatric risk gene MIR137 and its role in the neurodevelopment and behavior
Ying Cheng, Yunnan University
[Abstract]

Genetic analyses have linked MIR137 (encoding microRNA miR-137) to neuropsychiatric disorders, including schizophrenia and autism spectrum disorder. miR-137 plays important roles in neurogenesis and neuronal maturation, but the impact of miR-137 loss-of-function in vivo remains unclear. To systematically investigate the role of miR-137 in brain, we have generated the miR-137 germline knockout (gKO) and nervous system knockout (cKO) mice. The complete loss of miR-137 in mice (gKO and cKO) leads to postnatal lethality, while heterozygous gKO and cKO mice remain viable. Partial loss of miR-137 in heterozygous cKO mice results in dysregulated synaptic plasticity, repetitive behavior, and impaired learning and social behavior. Transcriptomic and proteomic analyses revealed that the miR-137 mRNA target, phosphodiesterase 10a (Pde10a), is elevated in heterozygous cKO mice. Moreover, we found that treatment with the Pde10a inhibitor papaverine or knockdown of Pde10a ameliorates the deficits observed in the heterozygous cKO mice. Our results collectively suggest that Mir137 plays essential roles in postnatal neurodevelopment and that dysregulation of miR-137 potentially contributes to neuropsychiatric disorders in humans.

15:20 - 15:40 Warburg Effects in Cancer and Normal Proliferating Cells
Huiyan Sun, Jilin University
[Abstract]

Warburg effect is a common characteristic of all proliferating cells, including cancer and normal proliferating cells (NPCs). Through conducting comprehensive comparative analyses on the transcriptomic data of over 7000 cancer and control tissues of 14 cancer types in TCGA and data of five NPC types in GEO, we suspect the reasons for the Warburg effect in cancer cells and NPCs are fundamentally different. Specifically, cancer cells do this mainly to produce net protons for neutralizing OH- that is generated persistently by cytosolic Fenton reactions, whereas NPCs do this to maintain the elevated cytosolic pH needed for the optimal performance of the ribosomal proteins. Moreover, cancer cells secrete lactic acids largely independent of lactate generation and they do this probably for protecting cancer cells from destruction by immune cells.

15:40 - 16:00 Integrated Proteomic and Glycoproteomic Analyses of Human High-Grade Serous Ovarian Carcinoma
Jianbo Pan, Center for Novel Target and Therapeutic Intervention, Institute of Life Sciences, Chongqing Medical University
[Abstract]

High-grade serous ovarian carcinomas (HGSCs) are the most common and lethal type of ovarian carcinoma. Understanding the molecular mechanisms of HGSC development, progression, and treatment represents critical steps to improve survival rate.In this study, we perform the mass spectrometry (MS)-based proteomic and glycoproteomic analysis of high-grade serous ovarian carcinoma (HGSC) and non-tumor tissues. Integration of the expression data from global proteomics and glycoproteomics reveals tumor-specific glycosylation, uncovers different glycosylation associated with tumor heterogeneity, and identifies glycosylation enzymes that were correlated with the altered glycosylation. Deeper understanding of the glycosylation process and production in different subtypes of HGSC is expected to provide more clues and instructions for cancer diagnosis, precision medicine and tumor-targeted therapy.

Session 5: Student Lightning Talks, chaired by Lili Hao
16:00 - 16:05 Integration and mining of sheep omics data and establishment of database system
Zhonghuang Wang, National Genomics Data Center, Beijing Institute of Genomics, CAS
China National Center for Bioinformation
16:05 - 16:10 RefRGim: an intelligent reference panel reconstruction method for genotype imputation with convolutional neural networks
Shuo Shi, National Genomics Data Center, Beijing Institute of Genomics, CAS
China National Center for Bioinformation
[Abstract]

Genotype imputation is a statistical method for estimating missing genotypes from a denser haplotype reference panel. Existing methods usually performed well on common variants, but may not be ideal for low-frequency and rare variants. Previous studies showed the population similarity between study and reference panels is one of the key factors influencing the imputation accuracy. Here, we developed an imputation reference panel reconstruction method (RefRGim) using convolutional neural networks, which can generate a study specified reference panel for each input data based on genetic similarity of individuals from current study and references. The convolutional neural networks were pretrained with single nucleotide polymorphism data from the 1000 Genomes Project. Our evaluations showed that genotype imputation with RefRGim can achieve higher accuracies than original reference panel, especially for low-frequency and rare variants. RefRGim will serve as an efficient reference panel reconstruction method for genotype imputation.

16:10 - 16:15 CNCB-NGDC online analysis platform for coronaviruses
Zheng Gong, National Genomics Data Center, Beijing Institute of Genomics, CAS
China National Center for Bioinformation
16:15 - 16:20 scMethBank: a database for single-cell whole genome DNA methylation maps
Wenting Zong, National Genomics Data Center, Beijing Institute of Genomics, CAS
China National Center for Bioinformation
[Abstract]

Single-cell bisulfite sequencing methods are widely used to assay epigenomic heterogeneity in cell states (1). Over the past few years, large amounts of data have been generated and facilitated deeper understanding of the epigenetic regulation of many key biological processes including early embryonic development, cell differentiation and tumor progression (1-6). However, the exploiting of massive amount of data make it hard to compare and reuse the published dataset at such scale, which emphasizes the urgent need to build a functional resource platform. Here we present scMethBank, a first open access and comprehensive database dedicated to the collection, integration, analysis and visualization of single-cell methylation data and metadata. Current release of scMethBank includes processed single-cell bisulfite sequencing data and curated metadata of 8328 samples derived from 15 public single-cell datasets, involving two species (human and mouse), 29 cell types and two diseases. In summary, scMethBank aims to assist researchers who interested in cell heterogeneity to explore and utilize whole genome methylation data at single cell level by providing browse, search, visualization, download functions and user-friendly online tools. The database is accessible at: https://ngdc.cncb.ac.cn/methbank/scm/

Session 6: Single-Cell Omics, chaired by Yuan Gao
16:20 - 16:40 Mapping the genetic architecture of human traits to cell types in the kidney identifies mechanisms of disease and potential treatments
Xin Sheng, Zhejiang University Medical Center
[Abstract]

The functional interpretation of genome-wide association studies (GWAS) is challenging due to the cell-type-dependent influences of genetic variants. Here, we generated comprehensive maps of expression quantitative trait loci (eQTLs) for 659 microdissected human kidney samples and identified cell-type-eQTLs by mapping interactions between cell type abundances and genotypes. By partitioning heritability using stratified linkage disequilibrium score regression to integrate GWAS with single-cell RNA sequencing and single-nucleus assay for transposase-accessible chromatin with high-throughput sequencing data, we prioritized proximal tubules for kidney function and endothelial cells and distal tubule segments for blood pressure pathogenesis. Bayesian colocalization analysis nominated more than 200 genes for kidney function and hypertension. Our study clarifies the mechanism of commonly used antihypertensive and renal-protective drugs and identifies drug repurposing opportunities for kidney disease.

16:40 - 17:00 Single cell omics and spatial transcriptomics of tumors reveals microenvironment crosstalk to identify potential therapeutic targets
Youqiong Ye, Shanghai Institute of Immunology, Department of Immunology and Microbiology, Shanghai JiaoTong University School of Medicine
[Abstract]

Immune checkpoint blockade (ICB) therapies exhibit substantial clinical benefit in different cancer lineages. However, it lacks significant curative effect on high-incidence malignant solid tumors such as colorectal cancer, liver cancer, and pancreatic cancer. How to enhance the efficacy of ICB therapy is an urgent issue. Tumor microenvironment-mediated immunosuppression is one of the key factors that determine the effectiveness of tumor immunotherapy. The "desmoplasia structure" is characterized by the enrichment of tumor-associated macrophages and stroma cells at the tumor boundary, which prevents cytotoxic T lymphocytes (CTL) from infiltrating the tumor core, but its specific composition and mechanism of action are unclear. In our study, we generated single-cell transcriptomics and spatial transcriptomics in solid tumors and combined a large number of known public omics and clinical data resources to explore the tumor immune infiltrated-excluded (IE) microenvironment related to "desmoplasia structure" that is an important feature in ICB therapy-resistant tumors. We dissect the tumor immune suppressive microenvironment and the interaction of subpopulations of stroma cells and macrophages, and screen targets involved in the generation of the "desmoplasia structure" by the interaction. Pre-clinically, blocking related targets enhanced the therapeutic efficacy of PD-1 blockaded in mouse model with tumor burden, accompanied by the increasing infiltration of killer T cells. Our results offer new insights into the interaction of tumor specific macrophage subpopulation and fibroblast subpopulations, especially the spatial niches and communication network, contribute to promote the desmoplastic tumor microenvironment and the resistance of immunotherapy, and may provide a new attractive strategy by disrupting the interaction between cells.

17:00 - 17:30 Keynote talk: Modeling regulatory map in silico
Ge Gao, Peking University
[Abstract]

Human individual cells, as the basic biological units of our bodies, carry out their functions through rigorous regulation of gene expression and exhibit heterogeneity among each other in every human tissue. In addition to identify individual genes, one is often interested in how multiple genes interact to form regulatory circuits and carry out cellular functions. Combining massive omics data and leading-edge statistical modeling approaches, we have developed set of novel bioinformatic technologies to delineate the regulatory map and characterize the functional genome in action globally during past years. Here we will present recent advances as well as their potential applications in clinical and translational study.