Suffix trees as language models

Suffix trees are data structures that can be used to index a corpus. In this paper, we explore how some properties of suffix trees naturally provide the functionality of an n-gram language model with variable n. We explain these properties of suffix trees, which we leverage for our Suffix Tree Language Model (STLM) implementation and explain how a suffix tree implicitly contains the data needed for n-gram language modeling. We also discuss the kinds of smoothing techniques appropriate to such a model. We then show that our suffix-tree language model implementation is competitive when compared to the state-of-the-art language model SRILM (Stolke, 2002) in statistical machine translation experiments.

Metadaten
Author:	Casey Redd Kennington, Martin Kay, Annemarie Friedrich ORCiD GND
Frontdoor URL	https://opus.bibliothek.uni-augsburg.de/opus4/105718
URL:	https://aclanthology.org/L12-1378/
ISBN:	978-2-9517408-7-7OPAC
Parent Title (English):	Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12), May 21-27, 2012, Istanbul, Turkey
Publisher:	European Language Resources Association (ELRA)
Place of publication:	Paris
Editor:	Nicoletta Calzolari, Khalid Choukri, Thierry Declerck, Mehmet Uğur Doğan, Bente Maegaard, Joseph Mariani, Asuncion Moreno, Jan Odijk, Stelios Piperidis
Type:	Conference Proceeding
Language:	English
Year of first Publication:	2012
Release Date:	2023/07/10
First Page:	446
Last Page:	453
Institutes:	Fakultät für Angewandte Informatik
	Fakultät für Angewandte Informatik / Institut für Informatik
	Fakultät für Angewandte Informatik / Institut für Informatik / Professur für Sprachverstehen mit der Anwendung Digital Humanities
Dewey Decimal Classification:	0 Informatik, Informationswissenschaft, allgemeine Werke / 00 Informatik, Wissen, Systeme / 004 Datenverarbeitung; Informatik

Open Access