From a far away perspective, FRMG is essentially a large coverage hand-crafted grammar for French that may be used to parse sentences and produces syntactic structures.

Hand-crafting large grammars (or Grammar Engineering) was the way to develop parsers 20 years ago for several important grammatical formalisms such as Tree Adjoining Grammars (TAG), LFGs, HPSGs, CCGs, ...

The arising of successful data-driven approaches have since then largely supplanted hand-crafted grammars and grammar engineering.

Several reproaches have been advanced against hand-crafted grammars:

  • they require a strong expertise both on the linguistic side and about the grammatical formalism. The data-driven approaches allow a separation between people with linguistic expertise to annotate treebanks on one side and people skilled in machine-learning on the other side.
  • it is difficult to increase the coverage of the grammar, because of the large diversity of syntactic phenomena to cover but also because of the increasing complexity of the interactions between the different structures of the grammar
  • it becomes also more and more difficult to maintain the grammar over time, modify it and extend it
  • it becomes more and more difficult to get the right parse without probabilities (on grammatical structures or operations on the structures), and probabilities requires data !
  • efficiency becomes an issue with more and more structures, and more and more interactions between them

These reproaches are largely true, but we can also list some advantages of hand-crafted grammars

  • even if difficult, they can be understood, when most data-driven approaches produce black-box models or non-linguistic grammars
  • it is also possible to extend them to cover new syntactic phenomena, something difficult with data-driven approaches without modifying/extending the training treebank
  • they tend to be more robust over various domains and benefit from the tendency of grammar designers to be as generic as possible when describing a phenomena, when data-driven tend to be strongly dependent on their training treebank

The development of FRMG over the last 10 years is here to prove that is possible to develop and maintain a large coverage grammar over a relatively long period of time, relying on good choices at the beginning and using many grammar engineering tricks since then.

One of the last of these engineering tricks is the development of FRMG Wiki, as a way to present the grammar, to offer a way for people to try the parser, to provide feedback, and also to discuss complex or missing syntactic phenomena. Sentence parses can also be easily in wiki page, as done for the following sentence "whoever you are, be welcome!"

  • 0
  • 0

This tutorial will take place in this linguistic wiki, with the objective to show the different components of FRMG

So, it is now time to look more closely at FRMG, whose acronym stands for FRench MetaGrammar. It is primarily a wide coverage abstract grammatical description for French.

Metagrammars could theoretically be directed used to parse sentences. However, they have been mostly designed to ease the work of syntacticians to describe grammatical phenomena using elementary constraints and a modular organization. In practice, metagrammars are used to derive an operational grammar. For FRMG, it is a Tree Adjoining Grammar (TAG). The grammar is then compiled into a chart-like parser and may then be used to parse.