Deep syntactic modeling and parsing
Range Concatenation Grammars (RCG)

RCG (Range Concatenation Grammars) have been introduced by Pierre Boullier and form a large hierarchy of grammars uniformly covering the PTIME class of the formalisms that may be parsed in polynomial space and time. From a linguistic point of view, this class includes mildly context-sensitive [MCS] formalisms who are becoming more and more important in Natural Language Processing and whose Tree Adjoining Grammars [TAG] are the most famous representatives. RCG are being studied following two main directions:

  • First, A formal study of (a) RCG's specific properties and of (b) translations from other grammatical formalisms to equivalent RCG. In our opinion, RCG are going to play an important role among the many other linguistic formalisms.
  • The second direction concerns the realization of a system to handle RCG. The current version already displays very promising performances. For example, we have compared the XTAG English parser with a RCG parser built from the same grammar and noted very important gains, corresponding to several order of complexity. This result is however still to be analyzed in more details.
Automata and Dynamic Programming
We believe that parsing may be better understood by
  1. using automata to encode various parsing strategies
  2. and evaluating these automata using Dynamic Programming interpretations to build tabular parsers

By tabulation, we mean that traces of computation are stored without redundancy in a table in order to better handle ambiguities in language (computation sharing), to detect loops (due to recursions), and to extract shared parse forests (or shared derivation forests).

Introducing automata and DP interpretations eases proofs, algorithm understanding. It also allows the design of new parsing strategies and of new tabular systems.

This approach has been formalized for CFG using Push Down Automata [PDA], for unification grammars (DCG) using Logical Push Down Automata [LPDA], and Feature TAGs using 2-stack Automata [2SA].

An implantation exists within DyALog system.

Tree Adjoining Grammars (TAG)

Tree Adjoining Grammars are a very interesting class of linguistically motivated grammars introduced by Joshi. They are natural extensions of Context-Free Grammars and belong also to a larger class of formalisms parsable in polynomial space and time: Mildly Context Sensitive Grammars. They are also equivalent to Linear Indexed Grammars (LIG).

Tabular algorithms exist for TAGs and LIGs but are usually designed for a specific parsing strategy and have varying complexities. There also exist different kinds of automata useful for TAGs and LIGs, such as 2-stack automata [2SA] and Embedded Push-Down Automata [EPDA].

In a recent joint work, Miguel Alonso Pardo and Eric de la Clergerie have proposed a restricted class of 2-SA that may be used to describe a wide range of parsing strategies for TAGs and LIGs. A Dynamic Programming interpretation of these 2-SA exists that gives the best known time and space general complexity. A more specialized version has also been proposed for bottom-up strategies that reduce the space complexity (but not the time complexity).