Grants awarded for Short-term scientific missions
In September 2014, the TextLink Action held its first call for short-term scientific missions (STSMs). The Action is proud to announce that, of the six applications that were received, the following STSM grants have been awarded:
Ms Ludivine Crible (Université catholique de Louvain) will visit Dr Sandrine Zufferey at the Université de Fribourg, and carry out a project on “Assessing the validity of annotation guidelines: towards multimodal equivalence of DRDs”.
During her stay, Ms Crible will present the annotation scheme elaborated in her PhD thesis to the researchers at the University of Fribourg, in order to discuss its strengths and weaknesses, as well as possible improvements in comparison with other annotation guidelines (especially for writing). This annotation scheme (Crible in prep.) provides instructions for the syntactic and functional description of DSDs. Its main contribution to TextLink is an operational closed-list of 29 functions which integrates both discourse relations and speech-specific functions like turn-taking or monitoring. The resulting taxonomy is a corpus-based annotation model situated in the legacy of the PDTB initiative (Prasad et al. 2007) adapted to the spoken modality (and possibly gestures, see Crible & Bolly 2015). During her stay Ms Crible will also conduct an inter-annotator experiment together with Dr Zufferey upon a small corpus of both written and spoken data, to test the validity, replicability and multimodality of the annotation of discourse relations.
Dr Christian Hardmeier (Uppsala University) will visit Prof. Bonnie Webber at the University of Edinburgh for a project on “Predicting Discourse Connectives in Parallel Texts”.
During his mission, Dr Hardmeier plans to extend to the prediction of discourse connectives a discourse element prediction model originally developed for anaphoric pronouns. The extended model will then be used for experiments related to cross-lingual annotation of discourse properties.
This extension to discourse connectives pursues two goals related to the scientific objectives of the TextLink COST Action:
1. While developing the classifier, it will be tested on different feature configurations and context windows. This should allow an estimate on how much context is needed to make reliable predictions for discourse connectives. Having this information will be useful when designing semi-automatic annotation methods for discourse connectives, which is one goal of the COST Action.
2. Once the classifier is ready, it can be used to predict discourse connectives in parallel texts. Of particular interest are those situations where one language lacks an explicit connective, expressing the discourse relation implicitly instead. In these cases, the classifier will be used to predict
an explicit discourse connective, which can be used in disambiguating the implicit relation. This would result in a method for automatic large-scale annotation of implicit discourse relations, another goal of the COST Action.
Based on these goals, the following work will be carried out:
1. Adaptation of the structure and the input features of the pronoun prediction neural network of Hardmeier et al. (EMNLP 2013) for prediction of discourse connectives (3 weeks).
2. Running of systematic experiments with different feature sets and context sizes to determine the impact of these choices on prediction performance (1 week).
3. Testing of the resulting predictor on corpus examples where there target language lacks an explicit discourse connective and evaluate how well these predictions reflect the nature of the implicit discourse relation (2 weeks).
Dr Yannick Versley (Institute for Computational Linguistics, Univ. Heidelberg) will visit Prof. Jörg Tiedemann at the Dept. of Linguistics and Philology, Uppsala University for a project on “Cross-lingual induction/harmonization of DRDs”.
During his stay, Dr Versley aims to explore methods of discovering discourse structuring devices (e.g. connectives) in languages currently lacking a dictionary, using annotation projection techniques as well as recently developed syntactic corpora with a language-universal annotation scheme. Building on his previous work (Versley 2010) as well as improvements by others (Laali and Kosseim 2014), Dr Versley will work together with Prof. Tiedemann and others in Uppsala to explore connections both between phrasal syntax and the grammar of connectives as well as those between languages in a parallel corpus. In particular, the following will be achieved: feature extraction from a parallel corpus based on rich linguistic preprocessing; tagging of discourse connectives; alignment of tagged discourse connectives, and discovery of alternative discourse connectives; syntax-based filtering based on universal dependencies.
Ms Fatemeh Torabi Asr and Dr Vera Demberg (Saarland University, Saarbruecken) will both visit Prof. Ted Sanders at the Utrecht institute of Linguistics for a project on “Towards a language-independent representation of discourse relations”.
Based on their previous work, the two groups will exchange ideas on cross-corpora and cross-linguistic studies which will shed light on the general characteristics of the discourse regardless of a particular taxonomy of relation senses, i.e., via cognitive and empirical validation of relation sense categories. The overlap between the Saarland computational research on discourse relations and discourse markers and the psycholinguistic approach to the same problem in Prof. Sander’s group will shed new light on the question of what types of relations exist or would best explain human understanding of text coherence. The program will include an invited talk by Ms Torabi Asr and Dr Demberg at the Utrecht discourse colloquium, and detailed meetings with researchers working on relevant issues, such as the Swiss-Dutch project on discourse annotation, cross-linguistic comparison and translation, acquisition of discourse connectives, psycholinguistics of discourse, and eye-tracking.
Dr Anna Nedoluzhko (Charles University in Prague) will visit Dr Kerstin Kunz, Saarland University for a project on “Comparison of PDiT and GECCo approaches to discourse”.
Her mission is aimed to compare two approaches to discourse phenomena that are applied in the Prague discourse Treebank (PDiT) in Prague and in the GECCo project (German-English contrasts in cohesion – Towards an empirically-based comparison) in Saarland University.
Initial pilot observations showed that in several respects, these two conceptions of textual phenomena analysis are very close, and their further comparison can bring theoretically interesting results. Together with her German colleagues, Dr Nedoluzhko will explore the interoperability of such annotation schemes, identify their differences and commonalities, and think of ways of applying both without losing important categories and aspects.
The first step of cooperation during her stay will consist of pilot parallel annotation according to the guidelines in both conceptions (PDiT and GECCo). Then, the set of commonalities and differences revealed by the annotation will be investigated. For example, in GECCo, textual phenomena are annotated primarily from the point of view of textual cohesion, i.e. all explicit markers of cohesion are annotated as such and are further classified. On the other hand, Prague discourse annotation is basing on both cohesive and reference properties. Analyzing annotated texts will show e.g. which discourse phenomena are better resolved by which approach. Another outcome of this cooperation will include resources, e.g. lexicons of discourse--relational devices. The comparison of the schemes, as well as their test application will show the gaps in both lexicons of discourse-relational devices.