Database Commons
Database Commons

a catalog of worldwide biological databases

Database Profile

Zenodo

General information

URL: https://zenodo.org/record/29887?ln%C2%BCen#.VsL3yDLWR_V
Full name:
Description: Drug toxicity is a major concern for both regulatory agencies and the pharmaceutical industry. A new system for identification of drug side effects from the literature was presented that combines three approaches: machine learning, rule- and knowledge-based approaches. This system has been developed to address the Task 3.B of Biocreative V challenge (BC5) dealing with Chemical-induced Disease (CID) relations.
Year founded: 2016
Last update:
Version:
Accessibility:
Accessible
Country/Region: Spain

Classification & Tag

Data type:
Data object:
NA
Database category:
Major species:
NA
Keywords:

Contact information

University/Institution: Pompeu Fabra University
Address: Research Programme on Biomedical Informatics (GRIB), IMIM, UPF, Barcelona, Spain
City:
Province/State:
Country/Region: Spain
Contact name (PI/Team): Laura I. Furlong
Contact email (PI/Helpdesk): lfurlong@imim.es

Publications

37952182
cellsig plug-in enhances CIBERSORTx signature selection for multidataset transcriptomes with sparse multilevel modelling. [PMID: 37952182]
Md Abdullah Al Kamran Khan, Jian Wu, Yuhan Sun, Alexander D Barrow, Anthony T Papenfuss, Stefano Mangiola

MOTIVATION: The precise characterization of cell-type transcriptomes is pivotal to understanding cellular lineages, deconvolution of bulk transcriptomes, and clinical applications. Single-cell RNA sequencing resources like the Human Cell Atlas have revolutionised cell-type profiling. However, challenges persist due to data heterogeneity and discrepancies across different studies. One limitation of prevailing tools such as CIBERSORTx is their inability to address hierarchical data structures and handle nonoverlapping gene sets across samples, relying on filtering or imputation.
RESULTS: Here, we present cellsig, a Bayesian sparse multilevel model designed to improve signature estimation by adjusting data for multilevel effects and modelling for gene-set sparsity. Our model is tailored to large-scale, heterogeneous pseudobulk and bulk RNA sequencing data collections with nonoverlapping gene sets. We tested the performances of cellsig on a novel curated Human Bulk Cell-type Catalogue, which harmonizes 1435 samples across 58 datasets. We show that cellsig significantly enhances cell-type marker gene ranking performance. This approach is valuable for cell-type signature selection, with implications for marker gene validation, single-cell annotation, and deconvolution benchmarks.
AVAILABILITY AND IMPLEMENTATION: Codes and the interactive app are available at https://github.com/stemangiola/cellsig; and the database is available at https://doi.org/10.5281/zenodo.7582421.

Bioinformatics. 2023:39(12) | 2 Citations (from Europe PMC, 2025-12-20)
37952162
DOSE-L1000: unveiling the intricate landscape of compound-induced transcriptional changes. [PMID: 37952162]
Junmin Wang, Steven Novick

MOTIVATION: The LINCS L1000 project has collected gene expression profiles for thousands of compounds across a wide array of concentrations, cell lines, and time points. However, conventional analysis methods often fall short in capturing the rich information encapsulated within the L1000 transcriptional dose-response data.
RESULTS: We present DOSE-L1000, a database that unravels the potency and efficacy of compound-gene pairs and the intricate landscape of compound-induced transcriptional changes. Our study uses the fitting of over 140 million generalized additive models and robust linear models, spanning the complete spectrum of compounds and landmark genes within the LINCS L1000 database. This systematic approach provides quantitative insights into differential gene expression and the potency and efficacy of compound-gene pairs across diverse cellular contexts. Through examples, we showcase the application of DOSE-L1000 in tasks such as cell line and compound comparisons, along with clustering analyses and predictions of drug-target interactions. DOSE-L1000 fosters applications in drug discovery, accelerating the transition to omics-driven drug development.
AVAILABILITY AND IMPLEMENTATION: DOSE-L1000 is publicly available at https://doi.org/10.5281/zenodo.8286375.

Bioinformatics. 2023:39(11) | 7 Citations (from Europe PMC, 2025-12-20)
27307137
Combining machine learning, crowdsourcing and expert knowledge to detect chemical-induced diseases in text. [PMID: 27307137]
Bravo À, Li TS, Su AI, Good BM, Furlong LI.

Drug toxicity is a major concern for both regulatory agencies and the pharmaceutical industry. In this context, text-mining methods for the identification of drug side effects from free text are key for the development of up-to-date knowledge sources on drug adverse reactions. We present a new system for identification of drug side effects from the literature that combines three approaches: machine learning, rule- and knowledge-based approaches. This system has been developed to address the Task 3.B of Biocreative V challenge (BC5) dealing with Chemical-induced Disease (CID) relations. The first two approaches focus on identifying relations at the sentence-level, while the knowledge-based approach is applied both at sentence and abstract levels. The machine learning method is based on the BeFree system using two corpora as training data: the annotated data provided by the CID task organizers and a new CID corpus developed by crowdsourcing. Different combinations of results from the three strategies were selected for each run of the challenge. In the final evaluation setting, the system achieved the highest Recall of the challenge (63%). By performing an error analysis, we identified the main causes of misclassifications and areas for improving of our system, and highlighted the need of consistent gold standard data sets for advancing the state of the art in text mining of drug side effects.Database URL: https://zenodo.org/record/29887?ln¼en#.VsL3yDLWR_V.

Database (Oxford). 2016:2016() | 8 Citations (from Europe PMC, 2025-12-20)

Ranking

All databases:
4339/6895 (37.085%)
Interaction:
805/1194 (32.663%)
Health and medicine:
1104/1738 (36.536%)
4339
Total Rank
16
Citations
1.778
z-index

Community reviews

Not Rated
Data quality & quantity:
Content organization & presentation
System accessibility & reliability:

Word cloud

Related Databases

Citing
Cited by

Record metadata

Created on: 2018-01-27
Curated by:
shaosen zhang [2024-08-22]
zheng luo [2024-07-16]
Farah Nazir [2018-04-06]