Automatic Speech Recognition (ASR) application for the Luxembourgish language - MSc Speech Technology
Date: | 20 January 2023 |
Leminh Nguyen is a graduate of the MSc Speech Technology program. In his thesis, he pioneered improvements in the automatic recognition of Luxembourgish, his native language. As a result of this work, he was invited by the Luxembourgish Ministry of Education to create an Automatic Speech Recognition (ASR) web application for the Luxembourgish language. He was also invited to present his work during the prestigious 2022 IEEE Spoken Language Technology workshop in Qatar. While in Qatar he took the time to speak with us about his research, its implications, and why he is so passionate about the topic.
Tell us a little about how you got invited to IEEE-Spoken Language Technology workshop in Qatar.
After I finished my Master's thesis on Speech Technology, I worked with my supervisors to restructure it as an academic paper which could be submitted to the IEEE Spoken Language Technology workshop. This was no simple matter! A thesis is comparatively quite long and structured in a relatively dense way. So it took a lot of work to trim it down. This was definitely worthwhile though because the paper was accepted and I was invited to present my work during a poster presentation session here in Qatar, which was a great experience. Especially since it is supported by big players in the Speech Technology field such as Amazon. The main focus of the workshop is on signal processing, ASR and speech synthesis. It is exciting to be here on the cutting edge, alongside breakthrough research and to participate, especially since I was the only Master student presenting here among otherwise only PhD students or postdocs! Funnily enough, I was also presenting alongside a former professor of mine of the University of Luxembourg, who also showed great interest in the approach I took in my research.
You developed this web application for the Luxembourgish Ministry of Education. How did you get this assignment and why did they ask you to develop this?
This application and the whole process were a direct result of my thesis project during the Speech Technology master, in which I worked together with the “ZLS”, the Zenter fir d'Lëtzebuerger Sprooch (Centre for the Luxembourgish Language), which is part of the Luxembourgish Ministry of Education. In the MSc Speech Technology, we are encouraged to work with external partners, and ZLS seemed like a perfect contact. I am still grateful that they offered me an internship during my thesis period, so I could write my Master thesis on the basis of my research and experiments there.
It was quite a great experience to work for a government and to see how they work internally. We had great cooperation between researchers such as myself and my colleague who were specialists in AI and a group of linguists, who were language experts in Luxembourgish. It was great to continue to work in such an interdisciplinary environment. Because Campus Fryslân and the Speech Technology Master programme are also both interdisciplinary, it was a natural fit.
Anyway, to answer your question succinctly, I only developed the web application after the summer break, when the Luxembourgish Ministry of Education hired me as a consultant in order to transform the work I did for my thesis into an interface that would be accessible to the public.
What exactly does the application do?
The application is basically an online UI of the models I created in my thesis research for those that want to be in contact with the language. We wanted to make it as user-friendly as possible, so people can upload their own audio files or record themselves. The application then visualises the audio wave-form and provides a transcription of the speech. If you then press play you can always see the current word being highlighted.
The goal was to showcase to everyone in Luxembourg the current status of Luxembourgish speech recognition and to share what is possible with the newest technologies and the state of the art. This is particularly important because Luxembourgish has previously not been investigated in the speech recognition domain and we didn’t have any other Luxembourgish speech recognition systems, so we wanted to have our first system out there and open source. Normally similar technologies are only available for the so-called high-resource languages, mainly big languages such as English or German. The government previously thought that ASR was not possible for Luxembourgish, so it’s quite an honour to show them otherwise.
Can you tell us something about the implications of your research or the impact it can have on society?
With some further development, use cases will include the Luxembourgish parliament. Every parliament session is currently manually transcribed. Of course, the models will never be perfect, so you would still need people who check the transcription, especially for legal jargon or English loanwords. But given the AI could do the bulk of the work, this would save a lot of time. Media outlets are also interested in these systems, as they can automatically generate subtitles. These models could be used in automatic translation applications. Given that about only half of the population in Luxembourg actually speaks Luxembourgish, the models can help to automatically translate Luxembourgish to the respective native languages of the other half of the population of Luxembourg.
You mentioned there are still further developments necessary in this field, is this why you decided to make the outcomes of your research open source?
Yes, my aim is to catalyze and accelerate the research in Luxembourgish Speech Technology and to make sure it’s possible for others to continue with my research. Given that by now I work as a full-time research scientist for Deepgram, a Silicon Valley AI start-up, I don’t have much bandwidth to continue advancing Luxembourgish ASR. By making my work open-source others can use this and advance the research. My dream is for Luxembourgish speech technology to reach a stage where we can have voice assistants such as Alexa understand and speak in Luxembourgish. In fact, this is why I went into the field of speech technology because I wanted a Luxembourgish voice assistant. If fellow researchers are interested in picking up the research, then please find the models and benchmark dataset under the following link: https://huggingface.co/Lemswasabi.
Do you have any tips for current Speech Technology students or students considering studying the programme?
I came to Leeuwarden to study Speech Technology because it is my hobby. I was able to turn my hobby into my job. It was a perfect match. While it is not an easy programme, it is very rigorous. The fact that I am speaking with you from a high-level scientific conference in Qatar is a testament to that. If you’re passionate about the field of Speech Technology, it is a great opportunity. I'd say that it is worth putting in the time and effort.