Suffix trees as language models

  • Suffix trees are data structures that can be used to index a corpus. In this paper, we explore how some properties of suffix trees naturally provide the functionality of an n-gram language model with variable n. We explain these properties of suffix trees, which we leverage for our Suffix Tree Language Model (STLM) implementation and explain how a suffix tree implicitly contains the data needed for n-gram language modeling. We also discuss the kinds of smoothing techniques appropriate to such a model. We then show that our suffix-tree language model implementation is competitive when compared to the state-of-the-art language model SRILM (Stolke, 2002) in statistical machine translation experiments.

Export metadata

Statistics

Number of document requests

Additional Services

Share in Twitter Search Google Scholar
Metadaten
Author:Casey Redd Kennington, Martin Kay, Annemarie FriedrichORCiDGND
Frontdoor URLhttps://opus.bibliothek.uni-augsburg.de/opus4/105718
URL:https://aclanthology.org/L12-1378/
ISBN:978-2-9517408-7-7OPAC
Parent Title (English):Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12), May 21-27, 2012, Istanbul, Turkey
Publisher:European Language Resources Association (ELRA)
Place of publication:Paris
Editor:Nicoletta Calzolari, Khalid Choukri, Thierry Declerck, Mehmet Uğur Doğan, Bente Maegaard, Joseph Mariani, Asuncion Moreno, Jan Odijk, Stelios Piperidis
Type:Conference Proceeding
Language:English
Year of first Publication:2012
Release Date:2023/07/10
First Page:446
Last Page:453
Institutes:Fakultät für Angewandte Informatik
Fakultät für Angewandte Informatik / Institut für Informatik
Fakultät für Angewandte Informatik / Institut für Informatik / Professur für Sprachverstehen mit der Anwendung Digital Humanities
Dewey Decimal Classification:0 Informatik, Informationswissenschaft, allgemeine Werke / 00 Informatik, Wissen, Systeme / 004 Datenverarbeitung; Informatik