SAME: Estimation of Potential Richness of Dark Matters in “Pan Metagenome” Estimation of Potential Richness of Dark Matters in “Pan Metagenome” Using Species Appearance Model


The study of complexity of metagenome populations is crucial in understanding different microbial communities. The potential number of microbes in the environment is much higher than our knowledge. However, most metagenomic projects only contain tens to hundreds of samples. Most of the microbes can hardly be sampled under such small sample size. Thus, there are many “dark matters” that never been observed. Here in this study, we proposed a statistical model, named SAM (Species Appearance Model), which uses only one to two hundred samples to optimize the parameters, and estimate the potential richness of dark matters when the data size is much higher. An index named ESS (Estimated saturated sample size) were also proposed as an indicator of the complexity of the metagenome population. In the dataset of the American Gut Project (AGP), SAM can precisely predict the OTU richness of pan metagenome with more than 1000 samples using only 200 samples. The ESS of AGP is ~25,000, which means the AGP population is very complex. Using our SAM model, researchers can estimate and decide how many samples they need to collect when initiating a new metagenomic project. Different ESS values of different metagenomic populations can also serve as a guidance of understanding their different complexities.


No Publication Information


  1. Hongfei Cui

    School of Economics and Management, University of Science and Technology Beijing, China

Community Ratings

UsabilityEfficiencyReliabilityRated By
0 user
Sign in to rate
Tool TypeToolkit
CategoryOther unclassified tools
User InterfaceTerminal Command Line
Latest ReleaseSAME1.0 (May 31, 2021)
Download Count1521
Submitted ByHongfei Cui