|
Originally published In Press as doi:10.1074/jbc.M204161200 on August 16, 2002
J. Biol. Chem., Vol. 277, Issue 48, 45765-45769, November 29, 2002
Using Functional Domain Composition and Support Vector Machines
for Prediction of Protein Subcellular Location*
Kuo-Chen
Chou and
Yu-Dong
Cai§¶
From Upjohn Laboratories, Pharmacia, Kalamazoo,
Michigan 49001-4940 and § Shanghai Research Centre of
Biotechnology, Chinese Academy of Sciences,
Shanghai 200233, China
Proteins are generally classified into the
following 12 subcellular locations: 1) chloroplast, 2) cytoplasm, 3)
cytoskeleton, 4) endoplasmic reticulum, 5) extracellular, 6) Golgi
apparatus, 7) lysosome, 8) mitochondria, 9) nucleus, 10) peroxisome,
11) plasma membrane, and 12) vacuole. Because the function of a protein is closely correlated with its subcellular location, with the rapid
increase in new protein sequences entering into databanks, it is
vitally important for both basic research and pharmaceutical industry
to establish a high throughput tool for predicting protein subcellular
location. In this paper, a new concept, the so-called "functional
domain composition" is introduced. Based on the novel concept, the
representation for a protein can be defined as a vector in a
high-dimensional space, where each of the clustered functional domains
derived from the protein universe serves as a vector base. With such a
novel representation for a protein, the support vector machine (SVM)
algorithm is introduced for predicting protein subcellular location.
High success rates are obtained by the self-consistency test, jackknife
test, and independent dataset test, respectively. The current approach
not only can play an important complementary role to the powerful
covariant discriminant algorithm based on the pseudo amino acid
composition representation (Chou, K. C. (2001)
Proteins Struct. Funct. Genet. 43, 246-255; Correction
(2001) Proteins Struct. Funct. Genet. 44, 60), but also may
greatly stimulate the development of this area.
*
The costs of publication of this
article were defrayed in part by the
payment of page charges. The article
must therefore be hereby marked
"advertisement" in
accordance with 18 U.S.C. Section
1734 solely to indicate this fact.
¶
Current address and to whom correspondence should be
addressed: Biomolecular Sciences Dept., UMIST, P. O. Box 88, Manchester, M60 1QD, United Kingdom. Tel.: 44-161-2008936; Fax:
44-161-2360409; E-mail: y.cai@umist.ac.uk.
Copyright © 2002 by The American Society for Biochemistry and Molecular Biology, Inc.

CiteULike Complore Connotea Del.icio.us Digg Reddit Technorati What's this?
This article has been cited by other articles:

|
 |

|
 |
 
H.-B. Shen and K.-C. Chou
Nuc-PLoc: a new web-server for predicting protein subnuclear localization by fusing PseAA composition and PsePSSM
Protein Eng. Des. Sel.,
November 10, 2007;
(2007)
gzm057v1.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
H.-B. Shen and K.-C. Chou
Gpos-PLoc: an ensemble classifier for predicting subcellular localization of Gram-positive bacterial proteins
Protein Eng. Des. Sel.,
January 23, 2007;
(2007)
gzl053v1.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
K. Lee, D.-W. Kim, D. Na, K. H. Lee, and D. Lee
PLPD: reliable protein localization prediction from imbalanced and overlapped datasets
Nucleic Acids Res.,
October 18, 2006;
34(17):
4655 - 4666.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
J. Guo and Y. Lin
TSSub: eukaryotic protein subcellular localization by extracting features from profiles
Bioinformatics,
July 15, 2006;
22(14):
1784 - 1785.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
Z. R. Li, H. H. Lin, L. Y. Han, L. Jiang, X. Chen, and Y. Z. Chen
PROFEAT: a web server for computing structural and physicochemical features of proteins and peptides from amino acid sequence.
Nucleic Acids Res.,
July 1, 2006;
34(Web Server issue):
W32 - W37.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
S. Li, D. W. Ehrhardt, and S. Y. Rhee
Systematic Analysis of Arabidopsis Organelles and a Protein Localization Database for Facilitating Fluorescent Tagging of Full-Length Arabidopsis Proteins
Plant Physiology,
June 1, 2006;
141(2):
527 - 539.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
A. Hoglund, P. Donnes, T. Blum, H.-W. Adolph, and O. Kohlbacher
MultiLoc: prediction of protein subcellular localization using N-terminal targeting sequences, sequence motifs and amino acid composition
Bioinformatics,
May 15, 2006;
22(10):
1158 - 1165.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
P. M. Kasson, J. B. Huppa, M. M. Davis, and A. T. Brunger
A hybrid machine-learning approach for segmentation of protein localization data
Bioinformatics,
October 1, 2005;
21(19):
3778 - 3786.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
M. Bhasin and G. P. S. Raghava
GPCRsclass: a web tool for the classification of amine type of G-protein-coupled receptors
Nucleic Acids Res.,
July 1, 2005;
33(suppl_2):
W143 - W147.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
A. Garg, M. Bhasin, and G. P. S. Raghava
Support Vector Machine-based Method for Subcellular Localization of Human Proteins Using Amino Acid Compositions, Their Order, and Similarity Search
J. Biol. Chem.,
April 15, 2005;
280(15):
14427 - 14432.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
K.-C. Chou and Y.-D. Cai
Predicting protein localization in budding Yeast
Bioinformatics,
April 1, 2005;
21(7):
944 - 950.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
M. S. Scott, D. Y. Thomas, and M. T. Hallett
Predicting Subcellular Localization via Protein Motif Co-Occurrence
Genome Res.,
October 1, 2004;
14(10a):
1957 - 1966.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
M. L. Rosen, M. Edman, M. Sjostrom, and A. Wieslander
Recognition of Fold and Sugar Linkage for Glycosyltransferases by Multivariate Sequence Analysis
J. Biol. Chem.,
September 10, 2004;
279(37):
38683 - 38692.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
M. Wang, J. Yang, G.-P. Liu, Z.-J. Xu, and K.-C. Chou
Weighted-support vector machines for predicting membrane protein types based on pseudo-amino acid composition
Protein Eng. Des. Sel.,
June 1, 2004;
17(6):
509 - 516.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
M. Bhasin and G. P. S. Raghava
Classification of Nuclear Receptors Based on Amino Acid Composition and Dipeptide Composition
J. Biol. Chem.,
May 28, 2004;
279(22):
23262 - 23266.
[Abstract]
[Full Text]
[PDF]
|
 |
|
Copyright © 2002 by the American Society for Biochemistry and Molecular Biology.
|
Advertisement
Advertisement
|