Large Scale Genomic Sequence SVM Classifiers

2005

Conference Paper

ei


In genomic sequence analysis tasks like splice site recognition or promoter identification, large amounts of training sequences are available, and indeed needed to achieve sufficiently high classification performances. In this work we study two recently proposed and successfully used kernels, namely the Spectrum kernel and the Weighted Degree kernel (WD). In particular, we suggest several extensions using Suffix Trees and modi cations of an SMO-like SVM training algorithm in order to accelerate the training of the SVMs and their evaluation on test sequences. Our simulations show that for the spectrum kernel and WD kernel, large scale SVM training can be accelerated by factors of 20 and 4 times, respectively, while using much less memory (e.g. no kernel caching). The evaluation on new sequences is often several thousand times faster using the new techniques (depending on the number of Support Vectors). Our method allows us to train on sets as large as one million sequences.

Author(s): Sonnenburg, S. and Rätsch, G. and Schölkopf, B.
Book Title: Proceedings of the 22nd International Conference on Machine Learning
Journal: Proceedings of the 22nd International Conference on Machine Learning
Pages: 849-856
Year: 2005
Day: 0
Editors: L De Raedt and S Wrobel
Publisher: ACM

Department(s): Empirical Inference
Bibtex Type: Conference Paper (inproceedings)

Address: New York, NY, USA
Digital: 0
Event Name: ICML 2005
Event Place: Bonn, Germany
Language: en
Organization: Max-Planck-Gesellschaft
School: Biologische Kybernetik

Links: PDF

BibTex

@inproceedings{3627,
  title = {Large Scale Genomic Sequence SVM Classifiers},
  author = {Sonnenburg, S. and R{\"a}tsch, G. and Sch{\"o}lkopf, B.},
  journal = {Proceedings of the 22nd International Conference on Machine Learning},
  booktitle = {Proceedings of the 22nd International Conference on Machine Learning},
  pages = {849-856},
  editors = {L De Raedt and S Wrobel},
  publisher = {ACM},
  organization = {Max-Planck-Gesellschaft},
  school = {Biologische Kybernetik},
  address = {New York, NY, USA},
  year = {2005}
}