Introducing Meta-Grammars
A modular organization with inheritance
A meta-grammar is organized in a modular way through an hierarchy of classes. A class is a bag of constraints used to specify a syntactic phenomena (or just a facet of it) .
The classes of FRMG may be browsed here. Once a class is selected, the right side panel "Class Graph View" may be used to navigate through the class hierarchy.
The following figure shows a fragment of the FRMG hierarchy (the full hierarchy is too large and deep to fit on a page)
As an example, a very basic class for adverbs will only specify that we expect an elementary tree anchored by an adverb. No more !
... }
We can define subclasses that inherits all the constraints from its parent class and that may be used to progressively refine syntactic phenomena.
For instance, we may have a subclass of adverb specifying constraints about the use of adverbs as modifiers (their most common usage), essentially indicating that we expect an auxiliary tree.
We can then further refine into the notion of adverb as modifier of verbs, by specifying that the root node should have category v, with maybe some other restriction on the type of the adverb.
... } .... }
Combining resources
Besides inheritance, modularity is ensured through the use of another very powerful mechanism, namely by providing/consuming resources. A class may require some resource (or functionality) that will be provided by some other class. For instance, the previous class adverb_as_modifier may be implemented by requiring the functionality of "modifier of something" through asking for resource x_modifier. The class x_modifier will be used to provide this resource. Several classes may be in competition to provide a same resource, and several classes may require a same resource.
<: adverb; .... } ... }
This resource management mechanism is quite powerful and nicely complements inheritance. In particular, it has been extended to allow a resource to be consumed several times by a class using distinct name spaces, something that can't be easily done through inheritance.
For instance, a basic resource agreement may be defined to provide agreement on gender, number, ... between a node N and its father node. This resource is consumed twice in class superlative_as_adj_mod, once in namespace det and one in namespace adj, acting on different nodes each times.
%% require agreemnt for the determiner (le|la) and the adjective ... } + agreement; %% provide agreement constraint between a node and its father father(N).bot.gender = node(N).bot.gender; .... }
However, historically, resources were mostly introduced for verbs to accept several verbal arguments, each one being seem as requiring a arg
resource, as implement in class verb_categorization.
class verb_categorization { node(v).cat = value(v|aux); $arg0::arg.kind = value(-|subj|nosubj|prepobj); ... ... } { desc.ht.diathesis = value(passive); $arg1::arg.kind = value(subj); $arg1::arg.pcas = value(-); $arg0::arg.kind = value(-|prepobj); $arg0::arg.pcas = value(-|par|contre); $arg2::arg.kind = value(-|prepobj|prepvcomp|prepscomp|prepacomp|acomp|scomp); $arg2::arg.pcas = value(~par); ... }
Inheritance and resources form the backbone of a meta-grammars (its organization in terms of class). The "flesh" is provided by the content of the classes, through constraints over the nodes of the elementary trees.
Topological constraints
First, we have topological or structural constraints:
- equality between nodes
- precedence: a node should precede another one in a tree
- dominance: a node should dominate another one in a tree. We distinguish the parent dominance (a node is a father of another one) and the ancestor dominance (a node is an ancestor of another one)
For instance, in first approximation, in a sentence, the Subject node should precede its verb node, and both nodes are dominated by the root S node. We can be more precise and state that S should be the father of the Subject.
%% declaration of nodes S, v, and Subject, with some decorations node S: [cat: S, type: std]; node v: [cat: v, type: anchor, id: v]; node Subject: [cat: N2, type: subst, id: subject]; %% The subject precedes the verb Subject < v; %% The sentence node dominates the subject node S >> Subject; %% the sentence node also dominates the verb node, but indirectly %% (to allow other nodes in-between) S >>+ v; .... }
Decoration constraints
We have also constraints over the decorations carried by the nodes. The decoration constraints may be directly carried on nodes, or expressed as equations between feature paths and values. The source of a feature path is generally a node, but can actually be the class itself denoted by desc
(equivalent to this or self in object-oriented languages) or a variable (prefixed by $ as in $foo).
... %% use of variable $number to force number agreement node Subject: [cat: N2, type: subst, id: subject, top: [number: $number]]; node v : [cat:v, type: anchor, id: v, top: [number: $number]]; }
... %% alternative use of a path equation to force number agreement node(Subject).top.number = node(v).top.number; }
... %% a non transitive verb has only one subject argument desc.ht.arg0.function=value(subject); desc.ht.arg1.function=value(-); desc.ht.arg2.function=value(-); } %% alternative, using a full feature structure as value ... %% a non transitive verb has only one subject argument desc.ht = value([arg0: [function: subject], arg1: [function: -], arg2: [function: -] ]); }
desc.ht = $ht; ... %% a non transitive verb has only one subject argument $ht.arg1 = value(subject); .... }
Guards
Going further, the equations may also used to express constraints on the presence or absence of a node. A positive or negative guard on a node is expressed as a Boolean formula over equations.
... %% a subject is present %% if and only if the verb mood is not imperative or infinitive Subject => node(v).top.mood = value(~imperative|infinitive); ~ Subject => node(v).top.mood = value(imperative|infinitive); }
These guards may be also be used to specify complex constraints over a node without the need to increase the number of classes.
... SubS + node(SubS).top.mode = value(participle|gerundive), node(SubS).top.inv = value(-), node(SubS).top.extraction = value(-), ( node(SubS).top.sat = value(-), ( node(Foot).cat = value(coo) | node(Foot).cat = value(~coo), node(Foot).top.number = node(SubS).bot.number, node(Foot).top.person = node(SubS).bot.person, node(Foot).top.gender = node(SubS).bot.gender ) | node(SubS).top.sat = value(ppart) ) | node(SubS).top.mode = value(~participle|gerundive), node(SubS).top.sat = value(-) ; }
Misc.
To shorten descriptionsm, it is also possible to define and use macros on feature values and feature paths.
%% macro on value, for default agreement template @defaultagr = [person: 3, number: sg, gender: masc] %% macro on path path @function0 = .ht.arg0.function ... node(CS).bot = value(@defaultagr); } ... desc.@function0 = value(subject); }
When debugging the meta-grammar, it is possible to disable a class and all its descendants
disable verb_categorization_passive
As often (always ?), the formalism provides its set of "hack" that may useful to known. For instance, nodes have a feature type, with a few special type values:
- alternative: for a internal node acting as a disjunction over its children (only one of them may be used at parsing time)
- sequence: for a internal node that has no linguistic interest (no category, no features) but having children
+ subject; node SubjectAlt: [type: alternative]; SubjectAlt >> CliticSubj; node CliticSubj: [cat: cln, type: coanchor]; SubjectAlt >> NominalSubj; node NominalSubj: [cat: N2, type: subst]; SubjectAlt >> SentSubj; node SentSubj: [cat: S, type: subst]; ... }
These special types are in particular used to build factorized trees.
It is also possible to state that a node is optional without a guard but by using the optional
feature.
... %% a proper noun may be preceded by an optional title node Monsieur : [cat: title, type: coanchor, optional: yes]; }
It is also possible to state that a node can be repeated zero, one, or several times in the parse trees, in a way similar to the Kleene star operator "*" used in regular expressions.
... node MiddleCoordSeq: [type: sequence, star: *]; node coord: [cat: coo, type: anchor]; node EndCoord: [cat: $cat]; MiddleCordSeq < coord; coord < EndCoord; MiddleCoordSeq >> MiddleCoord; MiddleCoordSeq >> coma; MiddleCoord < coma; node MiddleCoord: [cat: $cat]; node comma: [lex: ",", type: lex]; }
- Version imprimable
- Connectez-vous ou inscrivez-vous pour publier un commentaire