Prediction of protein subcellular locations using fuzzy k-NN method.

Ying Huang, Yanda Li
Author Information
  1. Ying Huang: State Key Laboratory of Intelligent Technology and Systems, Department of Automation, Institute of Bioinformatics, Tsinghua University, Beijing 100084, People's Republic of China. hying99@mails.tsinghua.edu.cn

Abstract

MOTIVATION: Protein localization data are a valuable information resource helpful in elucidating protein functions. It is highly desirable to predict a protein's subcellular locations automatically from its sequence.
RESULTS: In this paper, fuzzy k-nearest neighbors (k-NN) algorithm has been introduced to predict proteins' subcellular locations from their dipeptide composition. The prediction is performed with a new data set derived from version 41.0 SWISS-PROT databank, the overall predictive accuracy about 80% has been achieved in a jackknife test. The result demonstrates the applicability of this relative simple method and possible improvement of prediction accuracy for the protein subcellular locations. We also applied this method to annotate six entirely sequenced proteomes, namely Saccharomyces cerevisiae, Caenorhabditis elegans, Drosophila melanogaster, Oryza sativa, Arabidopsis thaliana and a subset of all human proteins.
AVAILABILITY: Supplementary information and subcellular location annotations for eukaryotes are available at http://166.111.30.65/hying/fuzzy_loc.htm

MeSH Term

Algorithms
Animals
Cellular Structures
Databases, Protein
Fuzzy Logic
Gene Expression Profiling
Gene Expression Regulation
Humans
Proteome
Reproducibility of Results
Sensitivity and Specificity
Sequence Alignment
Sequence Analysis, Protein
Species Specificity
Subcellular Fractions
Tissue Distribution

Chemicals

Proteome

Word Cloud

Created with Highcharts 10.0.0subcellularlocationsproteinmethoddatainformationpredictfuzzyk-NNpredictionaccuracyMOTIVATION:Proteinlocalizationvaluableresourcehelpfulelucidatingfunctionshighlydesirableprotein'sautomaticallysequenceRESULTS:paperk-nearestneighborsalgorithmintroducedproteins'dipeptidecompositionperformednewsetderivedversion410SWISS-PROTdatabankoverallpredictive80%achievedjackknifetestresultdemonstratesapplicabilityrelativesimplepossibleimprovementalsoappliedannotatesixentirelysequencedproteomesnamelySaccharomycescerevisiaeCaenorhabditiselegansDrosophilamelanogasterOryzasativaArabidopsisthalianasubsethumanproteinsAVAILABILITY:Supplementarylocationannotationseukaryotesavailablehttp://1661113065/hying/fuzzy_lochtmPredictionusing

Similar Articles

Cited By