Yuzhu Chen JL, Yufeng Zhang, Mingqian Zhang, Zheng Sun, Gongchao Jing, Shi Huang , Xiaoquan Su. Parallel-Meta Suite: interactive and rapid microbiome data analysis on multiple platforms. 2022. https://github.com/qdu-bioinfo/parallel-meta-suite
Ranking biomarkers by random forest importance scoring
A random forest model is run on each categorical variable in the metadata, feature importance is calculated and the model error rate is evaluated. Use ggplot2 to generate a feature importance plot for each categorical variable and save it as a PDF file. Outputs feature importance data to a text file, which includes the error rate of the random forest model and the average reduced accuracy of the feature.