Alpage Project team: Scientific activities: Deep syntactic modeling and parsing
The first part of the work of Alpage theoretical work on discourse structures will focus on the description and definition of connectors, i.e., words which lexicalize a discourse relation. The second part will study the discourse structures themselves, as induced by these discourse relations, and in particular the fundamental opposition between coordinating and subordinating relations. Indeed, this opposition is the basis on which important notions, such as the right frontier, can be defined. Moreover, most discourse theories (including D-STAG) suppose that discourse structures are trees, as it is the case for constituency structures at the syntactic level. But this has been shown by Laurence Danlos to be a simplistic approximation, and that DAGs (Direct Acyclic Graphs) are required, although with strong (although yet badly described constraints) [124]. This could lead to an improvement of the D-STAG model, so as to produce discourse "dependency structure" that are DAGs.
Moreover, not much is known about the linguistic resources other than discourse connectors for signalling coherence relations. Ongoing research by Laurence Danlos [44] aims to study non-discourse connector resources for marking coherence relations, namely "discourse verbs" and "discourse prepositions". "Discourse verbs" are verbs such as precede or cause which take as arguments eventualities or facts. An example of discourse preposition is with in John is crazy with grief.
This rather theoretic approach will allow to extend Alpage 's parsing systems so as to deal with the level of discourse, thanks to actual implementations of the D-STAG model. This will require the transcription of the linguistic descriptions of discourse connectors and relations in the form of a grammar of discourse. This is a non-trivial task, for example because of the mismatch between the syntactic and discourse levels that characterizes some discourse connectors (e.g., "ensuite" is unary at the syntactic level but binary at the discourse level).
Without trying to develop a full-featured text understanding system (which would be unrealistic with the current state of the art), we will extend the deep syntactic parsing systems described above thanks to discourse analysis that will indicate discourse relations, in particular thanks to the work on Synchronous TAG described in section 4.1.2. As said before, this will benefit to other parsing tasks such as anaphora resolution (e.g., it is reasonable to look for the antecedent of an anaphoric element in a searching space which is defined by the discourse structure) or information structure extraction, which is highly relevant for the syntactic and semantic levels (information structure studies the differences between Peter ate a cake and It's a cake that Peter ate, or between John fell; Mary pushed him and Mary pushed John; he fell).