Header logo is

On Designing an Automated Malaysian Stemmer for the Malay Language

2000

Conference Paper

ei


Online and interactive information retrieval systems are likely to play an increasing role in the Malay Language community. To facilitate and automate the process of matching morphological term variants, a stemmer focusing on common affix removal algorithms is proposed as part of the design of an information retrieval system for the Malay Language. Stemming is a morphological process of normalizing word tokens down to their essential roots. The proposed stemmer strips prefixes and suffixes off the word. The experiment conducted with web sites selected from the World Wide Web has exhibited substantial improvements in the number of words indexed.

Author(s): Tai, SY. and Ong, CS. and Abullah, NA.
Book Title: Fifth International Workshop on Information Retrieval with Asian Languages
Journal: Proceedings of the Fifth International Workshop on Information Retrieval with Asian Languages
Pages: 207-208
Year: 2000
Month: October
Day: 0
Publisher: ACM Press

Department(s): Empirical Inference
Bibtex Type: Conference Paper (inproceedings)

DOI: 10.1145/355214.355247
Event Name: Fifth International Workshop on Information Retrieval with Asian Languages
Event Place: Hong Kong, China

Address: New York, NY, USA
Digital: 0
Language: en
Organization: Max-Planck-Gesellschaft
School: Biologische Kybernetik

Links: PostScript
Web

BibTex

@inproceedings{3421,
  title = {On Designing an Automated Malaysian Stemmer for the Malay Language},
  author = {Tai, SY. and Ong, CS. and Abullah, NA.},
  journal = {Proceedings of the Fifth International Workshop on Information Retrieval with Asian Languages},
  booktitle = {Fifth International Workshop on Information Retrieval with Asian Languages},
  pages = {207-208},
  publisher = {ACM Press},
  organization = {Max-Planck-Gesellschaft},
  school = {Biologische Kybernetik},
  address = {New York, NY, USA},
  month = oct,
  year = {2000},
  doi = {10.1145/355214.355247},
  month_numeric = {10}
}