Statistical dependency parsing of French

Université Paris Diderot - INRIA

Content : This page gathers various resources for the statistical dependency parsing of French. In particular, preprocessing code and learnt models/grammars for MaltParser, MSTParser, Berkeley Parser, and a constituent-to-dependencies conversion tool for French.

Many thanks to Joakim Nivre, Ryan McDonald and Slav Petrov for making their parsers available.

Contributors : Marie Candito (contact), Benoît Crabbé, Pascal Denis, Mathieu Falco, François Guérin, Enrique Henestroza Anguiano, Joakim Nivre, Djamé Seddah

Funding : Part of this work is performed within the ANR SEQUOIA project (large coverage probabilistic syntactic parsing of French).

Dependency Corpus

(converted from French Treebank constituency trees)

The French Treebank consists in approx. 20000 sentences of the Le Monde newspaper, annotated for morphology and phrase-structure (Abeillé A. and Barrier N., Enriching a French Treebank, LREC'2004). Part of the treebank also contains grammatical functions for the constituents that depend on verbs, this part growing over time.

We designed an automatic procedure for converting the constituency trees with functional information into surface dependency trees. It resulted in a first version of 12531 surface dependency trees (cf. TALN'2009, LREC'2010), and here is a description of the resulting annotation scheme : French surface dependencies annotation scheme (in French).

At the occasion of the SPMRL 2013 shared task on statistical parsing of morphologically rich languages (Seddah et al., 2013), we revised the conversion procedure and applied it to a larger set of constituency trees with functional information available at that time, namely 18535 sentences (see the README of the French part of the SPMRL 2013 dataset for more information).

We distribute the converted dependency treebank freely provided one has the licence for the original French Treebank (see here).

Once you have the licence, you may contact marie . candito @ linguist . univ-paris-diderot .fr for the dependency treebank.


Parsing resources for MaltParser (bonsai 3.2)

Parsing resources for MSTParser (bonsai 3.2)

Parsing resources for Berkeley Parser (bonsai 3.2)

Summary of parsing performances (imac 2,66GHz)
(see the benchmark paper for details)
LASUASParsing time
(one-sentence raw file)
Parsing time
(1235-sentence raw file)
Malt87.389.742s1m 25s
MST88.290.91m 50s14m 39s
BKY86.891.06s12m 46s


