From a far away perspective, FRMG is essentially a large coverage hand-crafted grammar for French that may be used to parse sentences and to produce syntactic structures (parses).
Hand-crafting large grammars (or Grammar Engineering) was the way to develop parsers 20 years ago for several important grammatical formalisms such as Tree Adjoining Grammars (TAG), LFGs, HPSGs, CCGs, ...
The arising of successful (supervised) data-driven approaches have since then largely superseded hand-crafted grammars and grammar engineering.
Several weaknesses have been advanced against hand-crafted grammars:
- they require a strong expertise both on the linguistic side and about the grammatical formalism (TAGs, LFGs, ...). The data-driven approaches allow a separation between people with linguistic expertise to annotate treebanks on one side and people skilled in machine-learning on the other side.
- it is difficult to increase the coverage of the grammar, because of the large diversity of rarer and rarer syntactic phenomena to cover but also because of the increasing complexity of the interactions between the different structures of the grammar
- it becomes also more and more difficult to maintain the grammar over time, to modify it and to extend it
- it becomes more and more difficult to extract the right parse from larger and larger search spaces without probabilities (on grammatical structures or operations on the structures), and probabilities requires data !
- efficiency becomes an issue with more and more structures, and more and more interactions between them (larger search spaces). Again, propabilities seems to be the key to prune as soon as possible the search spaces (the extreme case being the greedy parsers such as Malt).
These weaknesses are largely true, but we can also list some advantages of hand-crafted grammars
- even if difficult, they can be understood, when most data-driven approaches produce black-box models or non-linguistic grammars
- it is also possible to extend them to cover new syntactic phenomena, something difficult with data-driven approaches without modifying/extending the training treebank
- they tend to be more robust over various domains and benefit from the tendency of grammar designers to be as generic as possible when describing a phenomena, when data-driven tend to be strongly dependent on their training treebank
- a lesser known fact is paradoxically their capacity to fail on some sentences, hinting (sometimes) to yet uncovered syntactic phenomena when a statistical parser will happily provides a probably erroneous parse !
The development of FRMG over the last 10 years is here to prove that is possible to develop and maintain a large coverage grammar over a relatively long period of time, relying on good choices at the beginning and using many grammar engineering tricks since then.
One of the last of these engineering tricks is the development of FRMG Wiki, as a way to present the grammar, to offer a way for people to try the parser, to provide feedback, and also to discuss complex or missing syntactic phenomena. Sentence parses can also be easily in wiki page, as done for the following sentence "whoever you are, be welcome!"
This tutorial will take place in this linguistic wiki, with the objective to show the different components of FRMG.
So, it is now time to look more closely at FRMG, whose acronym stands for FRench MetaGrammar. It is primarily a wide coverage abstract grammatical description for French.
Metagrammars could theoretically be directly used to parse sentences. However, they have been mostly designed to ease the work of syntacticians to describe grammatical phenomena using elementary constraints and a modular organization. In practice, metagrammars are then used to derive an operational grammar, a Tree Adjoining Grammar (TAG) in the case of FRMG. The operational grammar is then compiled into a chart-like parser. At this point, a lot of efforts has still to be deployed to get an efficient, robust, and accurate parser !