a tour of FRMG, a French (Meta)Grammar

This set of pages (book) is being developed as a 2 hours tutorial to be delivered at PARSEME-COST workshop (Dubrovnik, 26-27 September 2016).

It will also serve as an English introduction to FRMG and this wiki. Comments are welcome !

It should cover the following points:

  • a brief description of FRMG and a few words about FRMG wiki
  • a brief description of Tree Adjoining Grammars (TAGs) : notion of trees, tree operations, pro and cons of large TAGs
  • a presentation of meta-grammars as a solution to design large coverage grammatical descriptions
    • modularity and elementary constraints to ease grammatical descriptions
    • inheritance hierarchy
    • elementary constraints
      • nodes
      • the class itself (desc)
      • equality
      • precedence
      • dominance (father and ancestor)
      • node and class features
      • equations
      • anonymous nodes
      • macros and other short notations
    • resource producers/consumers
    • guards as complex constraints
    • browsing the classes
  • getting more compact grammars through tree factorization
    • disjunction
    • guards
    • interleaving or free node ordering
    • repetition (Kleene star)
  • browsing the grammar
    • in frmgwiki
    • some statistics about the trees
    • hypertags to link trees and anchors
  • Playing with the resulting parser
    • trying a few sentences
    • playing with disambiguation
    • highlighting edges
    • the preprocessing steps
      • Tokenizing with SxPipe
      • lexicon Lefff and FRMG lexer
    • installing the Alpage processing chain
  • disambiguation and beyond
    • shared forest
    • derivations vs dependencies
    • hand-crafted disambiguation rules
    • injecting some knowledge
  • the hard life : how to conciliate coverage, accuracy, and efficiency !
    • efficiency
      • a few sources of inefficiency (parser & disambiguation)
      • using TIGs
      • factorization
      • lexicalization
      • left-corner (lctag)
      • restrictions
      • guiding (by self training)
        • tagging
        • hypertagging
        • leftcorner restrictions
      • a few stats
    • coverage
      • using test suite and regression
      • using error mining
      • using robust partial parsing
      • using correction rules
    • accuracy
      • learning from the French TreeBank
      • combining with DyALog-SR, a transition-based statistical parser
      • domain adaptation with unsupervised learning (self-training)
      • feature engineering
  • FRMG and MWEs
    • at the level of SxPipe (named entities and some frozen expressions such as complex csu)
    • at the level of the parser (+ metagrammar) : predicative nouns and light verbs
    • at the level of the metagrammar : idiomatic expressions
    • at disambiguation level (terms)
    • the conversions issues for output schema with different notions and lists of MWEs
  • Discussion(s):
    • developing and maintaining a large coverage meta-grammar
    • starting a meta-grammar for a new language
    • re-using meta-grammar components (hierarchy, classes)
    • exploring new target formalisms