Sentence alignment of Hungarian-English parallel corpora using a hybrid algorithm

Krisztina Tóth; Richárd Farkas; András Kocsor

Sentence alignment of Hungarian-English parallel corpora using a hybrid algorithm

Krisztina Tóth
Richárd Farkas
András Kocsor

Abstract

We present an efficient hybrid method for aligning sentences with their translations in a parallel bilingual corpus. The new algorithm is composed of a length-based and anchor matching method that uses Named Entity recognition. This algorithm combines the speed of length-based models with the accuracy of anchor finding methods. The accuracy of finding cognates for Hungarian-English language pair is extremely low, hence we thought of using a novel approach that includes Named Entity recognition. Due to the well selected anchors it was found to outperform the best two sentence alignment algorithms so far published for the Hungarian-English language pair.

Downloads

Download data is not yet available.

Published

2008-01-01

How to Cite

Tóth, K., Farkas, R., & Kocsor, A. (2008). Sentence alignment of Hungarian-English parallel corpora using a hybrid algorithm. Acta Cybernetica, 18(3), 463-478. Retrieved from https://cyber.bibl.u-szeged.hu/index.php/actcybern/article/view/3733

Download Citation

Issue

Vol 18 No 3 (2008)

Section

Regular articles

Sentence alignment of Hungarian-English parallel corpora using a hybrid algorithm

Abstract

Downloads

Most read articles by the same author(s)