Factorizing trees

We have seen that a large coverage TAG may have many trees (several thousand trees, and easily more), but with the elementary tree sharing many fragment. For efficiency reason, it is interesting to reduce the number of trees by factorizing them, sharing the common parts and using regexp-like regular operators to handle the other parts. These regular operators, like the disjunction, does not change the expressive power of the TAGs because they can always by unfactorizing to get a finite set of equivalent non-factorized trees.


When several subtrees may branch under a same node, we may use a disjunction to group the set of alternatives subtrees. For instance, the following figure shows how several possible realizations for French subjects (as Nominal Phrase, clitics, or infinitive sentences) may be grouped under a disjunction node (noted as a diamond node).

A guard to control the presence or absence of a subject

Note: Disjunction nodes are associated with type: alternative in meta-grammar descriptions


Disjunction may be used to state that a subtree rooted at some node $N$ is present or not. However, the presence or absence of a subtree is often controlled by some conditions. Such situations may be captured by guards on nodes, with the conditions being expressing as boolean formula over equations between feature paths (and feature values).

The following figure illustrates the use of a guard over the previous suject disjunction node to state that a subject is present in French when the verb mood is not infinitive, participial, or imperative, and absent otherwise.

Free ordering (shuffling)