Applying dynamic Bayesian networks in transliteration detection and generation

02 December 2011

PhD ceremony: Mr. P. Nabende, 16.15 uur, Aula Academiegebouw, Broerstraat 5, Groningen

Dissertation: Applying dynamic Bayesian networks in transliteration detection and generation

Promotor(s): prof. J.A. Nerbonne

Faculty: Mathematics and Natural Sciences

Transliteration detection and generation are Natural Language Processing (NLP) tasks aimed at improving performance in NLP applications such as Machine Translation. There is a growing body of research on methods that can improve transliteration detection and generation quality. We propose applying two edit distance(ED)-based Dynamic Bayesian Network (DBN) modeling approaches that implement Pair Hidden Markov Models (Pair HMMs) and transduction-based DBN models for computing transliteration similarity.

We explore the effects of several factors represented by the DBN models on transliteration detection and generation quality. We specified and tested several definitions of Pair HMM emission and transition parameters while evaluating different scoring algorithms including the Forward and Viterbi algorithms and their log-odds versions obtained in combination with a random Pair HMM. From the transduction-based DBN modeling approach, we evaluated different generalizations of the basic techniques that define specific types of dependencies which we hypothesize to be important for computing transliteration similarity including edit operations and context dependencies.

We used standard transliteration datasets for eleven language pairs (English-Arabic, English-Bengali, English-Chinese, English-Dutch, English-French, English-German, English-Hindi, English-Kannada, English-Russian, English-Tamil, and English-Thai) to evaluate the performance of the DBN models. Transliteration detection and generation results underscore the importance of representing character context. Ensemble-based applications of the DBN models also resulted in improved transliteration detection quality. The ED-based DBN models considerably improved f-score values for mining English-Hindi and English-Tamil transliterations and posted competitive f-score values for other language pairs compared to best results from state-of-the-art approaches. The use of contextual transformation rules in post-processing steps after applying Pair HMMs also resulted in a large improvement in transliteration generation quality.

Last modified:

13 March 2020 01.11 a.m.

Share this Facebook Twitter LinkedIn

View this page in: Nederlands

More news

21 November 2024

Dutch Research Agenda funding for research to improve climate policy

Michele Cucuzzella and Ming Cao are partners in the research programme ‘Behavioural Insights for Climate Policy’
13 November 2024

Can we live on our planet without destroying it?

How much land, water, and other resources does our lifestyle require? And how can we adapt this lifestyle to stay within the limits of what the Earth can give?
13 November 2024

Emergentie-onderzoek in de kosmologie ontvangt NWA-ORC-subsidie

Emergentie in de kosmologie - Het doel van het onderzoek is oa te begrijpen hoe ruimte, tijd, zwaartekracht en het universum uit bijna niets lijken te ontstaan. Meer informatie hierover in het nieuwsbericht.

Applying dynamic Bayesian networks in transliteration detection and generation

More news

Dutch Research Agenda funding for research to improve climate policy

Can we live on our planet without destroying it?

Emergentie-onderzoek in de kosmologie ontvangt NWA-ORC-subsidie