Statistical language models within the algebra of weighted rational languages

Thomas Hanneforth; Kay-Michael Würzner

Statistical language models within the algebra of weighted rational languages

Authors

Thomas Hanneforth
Kay-Michael Würzner

Abstract

Statistical language models are an important tool in natural language processing. They represent prior knowledge about a certain language which is usually gained from a set of samples called a corpus. In this paper, we present a novel way of creating N-gram language models using weighted finite automata. The construction of these models is formalised within the algebra underlying weighted finite automata and expressed in terms of weighted rational languages and transductions. Besides the algebra we make use of five special constant weighted transductions which rely only on the alphabet and the model parameter N. In addition, we discuss efficient implementations of these transductions in terms of virtual constructions.

Downloads

Download data is not yet available.

Downloads

Published

2009-01-01

How to Cite

Hanneforth, T., & Würzner, K.-M. (2009). Statistical language models within the algebra of weighted rational languages. Acta Cybernetica, 19(2), 313–356. Retrieved from https://cyber.bibl.u-szeged.hu/index.php/actcybern/article/view/3771

Download Citation

Issue

Vol. 19 No. 2 (2009)

Section

Regular articles

Statistical language models within the algebra of weighted rational languages

Authors

Abstract

Downloads

Downloads

Published

How to Cite

Issue

Section

Developed By

Information

Make a Submission

Current Issue