BACKGROUND: Interactions of microbes and diseases are of great importance for biomedical research. However, large-scale of microbe-disease interactions are hidden in the biomedical literature. The structured databases for microbe-disease interactions are in limited amounts. In this paper, we aim to construct a large-scale database for microbe-disease interactions automatically. We attained this goal via applying text mining methods based on a deep learning model with a moderate curation cost. We also built a user-friendly web interface that allows researchers to navigate and query required information.
RESULTS: Firstly, we manually constructed a golden-standard corpus and a sliver-standard corpus (SSC) for microbe-disease interactions for curation. Moreover, we proposed a text mining framework for microbe-disease interaction extraction based on a pretrained model BERE. We applied named entity recognition tools to detect microbe and disease mentions from the free biomedical texts. After that, we fine-tuned the pretrained model BERE to recognize relations between targeted entities, which was originally built for drug-target interactions or drug-drug interactions. The introduction of SSC for model fine-tuning greatly improved detection performance for microbe-disease interactions, with an average reduction in error of approximately 10%. The MDIDB website offers data browsing, custom searching for specific diseases or microbes, and batch downloading.
CONCLUSIONS: Evaluation results demonstrate that our method outperform the baseline model (rule-based PKDE4J) with an average [Formula: see text]-score of 73.81%. For further validation, we randomly sampled nearly 1000 predicted interactions by our model, and manually checked the correctness of each interaction, which gives a 73% accuracy. The MDIDB webiste is freely avaliable throuth http://dbmdi.com/index/.