The anatomy of antonymy: a corpus-driven approach
PhD ceremony: Ms. G.V. Lobanova, 11.00 uur, Academiegebouw, Broerstraat 5, Groningen
Dissertation: The anatomy of antonymy: a corpus-driven approach
Promotor(s): prof. L.C. Verbrugge
Faculty: Mathematics and Natural Sciences
This dissertation deals with opposites, that is, in Dutch, words like arm – rijk, dag – nacht, openen – sluiten, and other pairs that express some type of contrast. First, we explore pattern-based methods for finding opposites automatically. Second, we analyze automatically found opposites and compare them with opposites extensively studied and classified by theoretical linguists.
Our methodology is based on the assumption that opposites co-occur with each other within a sentence significantly more often than would be expected by chance and that often they can be found in intrasentential patterns like [tussen <ANT> en <ANT>]. Using small sets of six, 12 and 18 seed pairs expressed either by adjectives, nouns or verbs, we identify the best patterns for finding new pairs of opposites in a 450 million word newspaper corpus of Dutch. In the first study, we automatically generate strictly textual patterns like [either <ANT> countries or <ANT> countries] that do not contain any syntactic information, but simply capture surface strings. In the second study, we generate surface patterns that contain part-of-speech information about target word pairs, like [the difference between <ANT/Adj> and <ANT/Adj>]. In the third study, we use a parsed corpus to automatically acquire patterns with syntactic dependencies. Such patterns abstract away from the surface structure capturing that, for example, <ANT1/Noun> is the subject and <ANT2/Noun> is the direct object and they are connected by the verb appreciate.
The best results were achieved with part-of-speech patterns, which identified many typical as well as novel opposites. Textual patterns found the same most frequent opposites across the seed sets of all three syntactic categories and the majority of these pairs were well-established opposites. Dependency patterns found the least number of opposites per seed set but they found many novel pairs.
Overall, the best results are achieved by the algorithm that relies on adding the minimum amount of syntactic information, namely only part-of-speech information. Since this method does not require any computationally costly preprocessing steps and can easily be applied to vast amounts of data, part-of-speech patterns offer a promising solution to automatic extraction of opposites.
The results show that the range of automatically found opposites surpasses the limited number of well-established opposites commonly discussed in the theoretical approaches on opposites. In particular, pattern-based methods can find not only typical opposites like oud – nieuw, arm - rijk, but also less conventional opposites like nieuw – bestaand, nieuw – tweedehands, nieuw – bekend, and oud – recent, non-typical domain-specific opposites like wit –rood (wine), Democraat – Republikein (political parties) and context-dependent pairs like migrant – Nederlander (Dutch newspaper texts), buitenlands – Nederlands (as an analogue of buitenlands – binnenlands in the context of local and international policies). Although such pairs exhibit similar behavior in the corpus to the canonical opposites, non-typical context-dependent opposites have been neglected in theoretical classifications. Our results provide evidence that opposites include a much wider range of pairs than has been previously recognized.
In fact, automatically found opposites, especially domain-specific and context-dependent pairs that are often missed in the existing lexical resources, are particularly useful for other natural language processing tasks. This is further confirmed by the fact that, contrary to our assumptions, we found no differences between typical and non-typical opposites as to the frequency and the types of patterns in which they were found. This shows that both types are valid opposites that need to be studied in the future.
Last modified: | 13 March 2020 01.02 a.m. |
More news
-
21 November 2024
Dutch Research Agenda funding for research to improve climate policy
Michele Cucuzzella and Ming Cao are partners in the research programme ‘Behavioural Insights for Climate Policy’
-
13 November 2024
Can we live on our planet without destroying it?
How much land, water, and other resources does our lifestyle require? And how can we adapt this lifestyle to stay within the limits of what the Earth can give?
-
13 November 2024
Emergentie-onderzoek in de kosmologie ontvangt NWA-ORC-subsidie
Emergentie in de kosmologie - Het doel van het onderzoek is oa te begrijpen hoe ruimte, tijd, zwaartekracht en het universum uit bijna niets lijken te ontstaan. Meer informatie hierover in het nieuwsbericht.