Computational Generation of Chinese Noun Phrases

Public Defence: 12 April 2022 at 10:15. Academiegebouw, Domplein 29.

[PDF]   [DOI]












Abstract.

The linguist James Huang categorized languages into “cool” languages (i.e., languages that rely more on context) and “hot” languages (i.e., languages that rely less on context). Mandarin is thought to be a textbook example of “cool” languages, far “cooler” than languages such as English and Dutch; Huang hypothesised that the intended meaning of Mandarin phrases depends more on context than that of their English counterparts, and, therefore, that phrases in Mandarin are more likely to be omitted or under-specified than in English if their contexts provide enough information for readers to infer their meanings. James Huang originally introduced coolness in connection with the use of anaphora. Mandarin is “cool” since its pronouns are often more naturally omitted than English. For example, if someone asks, “Did John see Tom yesterday?”, a Mandarin speaker could simply answer “看见了。” (kanjianle, saw) to say “He saw him”, dropping pronouns in both the subject position and the object position. By contrast, the English word “saw” cannot form grammatically correct sentence on its own. In later work, the notion of coolness has been linked to the clarity-brevity trade-off in the use of language. It has been suggested that speakers of “cool” languages tend to keep their utterances shorter but less clear than speakers of “hot” languages. This suggestion would make the impact of coolness on language use very extensive. In this thesis, we decided to focus on noun phrases and aim at understanding and validating the “coolness” hypothesis on Mandarin noun phrases.

We approached this issue with the help of Natural Language Generation Techniques. Specifically, we conduct experiments to find out what Mandarin speakers say in a given situation, compare this with what English speakers say, check whether the outcomes are in line with the coolness hypothesis, and build natural language generation models to re-produce these speakers’ behaviour. Computational models help us understand better how people speak. Conversely, understanding human speaking patterns can help us build more mature practical natural language generation systems for Mandarin.

We were curious about two types of noun phrases. One is referring expression. Suppose Tom is the only student on the bus and he wears glasses. On the bus, there are 20 other people and 3 of them wear glasses. To refer to Tom, one can use the referring expression “the student” which rules out all other things (e.g., other people, chairs, etc.) on the bus. The other is quantified expressions. To describe the situation of people in the aforementioned bus using a quantified expression, one could say “Only one student is on the bus.”

In this thesis we link the phenomena we observed during our studies to the coolness hypothesis. Although most of the evidence that we found supports coolness, we also identified some counter-evidence. This suggests that coolness holds true in certain situations but not always.

We hope our work will pave the way for computational investigations of other differences between languages, and that it will inspire better natural language generation systems for Mandarin.