Révisions

MWEs are a real difficulty in parsing.

The main issues are

  • the lack of consensus on defining and capturing MWEs
  • no closed lists or operational specif of MWEs
  • a large diversity of MWE kinds: named entities, terms, locutions, idioms, ...
  • a range of situation going from frozen to semi-productive MWEs

In FRMG, these diverse situations has led to a diversity of solutions, more or less perfect, at all levels, from the meta-grammar level, in the pre-parsing phases, during parsing, in the disambiguisation phase, or even during conversion to some conversion schema.

at the level of SxPipe (named entities and some frozen expressions such as complex csu)

an ambiguous DAG produced by SxPipe with a MWE reading

  • 0
  • 0
_incise_starten faiten faitadv,,_ilclnclnenclgclgfaitfairev_VModdesdeprep_ledettonnestonnenc_S!!_voiddetN2PPvoidstartS2clgsubjectincisexcompvmod
Graph

  • 0
  • 0
_incisebien quebien quecsutrèstrèsadvdouédouéadj,,_ilclnclndoitdevoirven faiten faitadvpartirpartirv_end_SPunctvoidadjinciseSubSmodvmodSVsubjectS
Graph

at the level of the parser (+ metagrammar) : predicative nouns and light verbs

  • 0
  • 0
ilclnclnfaitfairevattentionattentionncpredààpreplaledetsituationsituationnc_S.._detN2voidSprepargncpredsubject
Graph

  • 0
  • 0
celacelaproneneclnegluicldcldposeposervpaspasadvnegvraimentvraimentadvproblèmeposer problèmencpred_S!!_voidSvmodV1prepargclnegsubjectncpred
Graph

at the level of the metagrammar : idiomatic expressions

  • 0
  • 0
qu'que?priestêtreaux-cececlnqu'quequeilclnclnditdirev_S\??_csusubjectvoidS2subjectSobject
Graph

  • 0
  • 0
c'ceclnestêtreauxainsiainsiadvqu'queprelililimpilimpfautfalloirvfairefairev_end_SsubjectCleftQuePunctSxcompimpsubjSS
Graph

  • 0
  • 0
ilclnclnestêtreauxvenuvenirvililimpilimpn'neclnegycllcllaavoirvpaspasadvnegsisiadvlongtempslongtempsadv_adv!!__end_SV1cllclnegimpsubjvmodvoidadvadvPunctvmodInflsubjectS
Graph

  • 0
  • 0
jeclnclndoisdevoirv_incise,,_serviceservicencobligeobligerv,,_vousclaclaquitterquitterv_S!!_voidSobjectVsubjectvoidvoidincisesubjectvmod
Graph

also found "Fin de l'appartheid oblige, ..." in the FTB (where is the limit between a locution like "noblesse oblige" and a productive construction "N oblige" ?)

also quoted constructions interesting for some specific Named Entities. But no clear solution when there is no quotes !

  • 0
  • 0
_N2""_AutantUwnpenclgclgemporteemportervleledetventventnc""_estêtrev_compunundettrèstrèsadvgrandgrandadjclassiqueclassiquenc_end_Sdetobjectsubjectclgvoidquoted_SvoidN2adjdetNPunctScompsubject
Graph

at disambiguation level

terms and disamb rules (favoring longest expressions)

the conversions issues for output schema

with different notions and lists of MWEs

FRMG provides outputs following several syntactic annotation schema, such as PASSAGE, FTB/CONLL, or the more recent Universal Dependency (UD) schema for French. Unfortunately, all these schema differ on their notion, list, and representation of MWEs. The conversion process should therefore take care, as much as possible, of these cases.

Some limit and complex cases

Some well identified MWEs tend to get a lexical entry in Lefff, but may be the trace of some more productive construction. As a result, we get several distinct parses that actually corresponds to a same syntactic phenomena !

For instance, we have the case of "beaucoup de" ou "peu de" that are listed as determiners in Lefff, but may also be seen as the combination of a predet with the prep de.

  • 0
  • 0
beaucoup debeaucoup dedetpersonnespersonnencsontêtreauxvenuesvenirv_end_SdetPunctSInflsubject
Graph

  • 0
  • 0
énormémenténormémentpredetdedepreppersonnespersonnencsontêtreauxvenuesvenirv_end_SdedetPunctSInflsubject
Graph

And this notion of predet is also productive for other constructions

  • 0
  • 0
peupeupredetd'deprepentreentreprepnousluiprosontêtreauxvenusvenirv_end_SentredeproPunctSInflsubject
Graph

  • 0
  • 0
suffisammentsuffisammentpredetd'deprepentreentreprepnousluiprosontêtreauxvenusvenirv_end_SentredeproPunctSInflsubject
Graph

Another similar case is given by "il y a" that is so common that it has an entry in Lefff as a preposition.

  • 0
  • 0
ilclnclnestêtreauxvenuvenirvililimpilimpycllcllaavoirvlongtempslongtempsadv_end_ScllimpsubjvPunctSvmodInflsubject
Graph

  • 0
  • 0
ilclnclnétaitêtreauxvenuvenirvililimpilimpn'neclnegycllcllavaitavoirvpaspasadvnegsisiadvlongtempslongtempsadv_end_SV1cllclnegimpsubjvmodadvPunctSvmodInflsubject
Graph

The productivity of some construction related to the fact they denote unusual part-of-speech may be a problem. Among the MWEs that not yet handled by Sxpipe and FRMG, we have expressions like "je ne sais" like in "il est venu je ne sais comment" or "il a pris je ne sais quel livre", with many variations "on ne sait", "nous ne savons", ... We also have the expression "n'importe qu" as in "il fait n'importe quoi", "il travaille n'importe comment", "n'importe quel élève te le dira".

  • 0
  • 0
n'importe queln'importe queldetélèveélèvenctecldcldleclacladiradirev_S!!_detvoidSobjectprepargsubject
Graph