ALPaGE

Statistical dependency parsing of French

Université Paris Diderot - INRIA

Content : This page gathers various resources for the statistical dependency parsing of French. In particular, preprocessing code and learnt models/grammars for MaltParser, MSTParser, Berkeley Parser, and a constituent-to-dependencies conversion tool for French.

Many thanks to Joakim Nivre, Ryan McDonald and Slav Petrov for making their parsers available.

Contributors : Marie Candito (contact), Benoît Crabbé, Pascal Denis, Mathieu Falco, François Guérin, Enrique Henestroza Anguiano, Joakim Nivre, Djamé Seddah

Funding : Part of this work is performed within the ANR SEQUOIA project (large coverage probabilistic syntactic parsing of French).

Dependency Corpus

The French Treebank consists in 12500 sentences of the Le Monde newspaper, annotated for morphology, phrase-structure and grammatical functions (Abeillé A. and Barrier N., Enriching a French Treebank, LREC'2004).

It has been automatically converted into surface dependency trees (cf. TALN'2009, LREC'2010), and here is a description of the resulting annotation scheme : French surface dependencies annotation scheme (in French).

We distribute the converted dependency treebank freely provided one has the licence for the original French Treebank (see here).

Once you have the licence, you may contact marie . candito @ linguist . univ-paris-diderot .fr for the dependency treebank.

Parsing

Parsing resources for MaltParser (bonsai 3.2)

Parsing resources for MSTParser (bonsai 3.2)

Parsing resources for Berkeley Parser (bonsai 3.2)

Summary of parsing performances (imac 2,66GHz)
(see the benchmark paper for details)
LASUASParsing time
(one-sentence raw file)
Parsing time
(1235-sentence raw file)
Malt87.389.742s1m 25s
MST88.290.91m 50s14m 39s
BKY86.891.06s12m 46s

Publications

Candito M.-H., Nivre J., Denis P. and Henestroza Anguiano E., 2010,
Benchmarking of Statistical Dependency Parsers for French, Proceedings of COLING'2010 (poster session), Beijing, China
pdf
Candito M.-H., Crabbé B., and Denis P., 2010,
Statistical French dependency parsing: treebank conversion and first results, Proceedings of LREC'2010, La Valletta, Malta
pdf
Seddah D., Candito M.-H. and Crabbé B., 2009,
Cross-parser evaluation and tagset variation: a French treebank study, in Proceedings of IWPT 2009, Paris, France
pdf
Candito M.-H. and Crabbé B., 2009,
Improving generative statistical parsing with semi-supervised word clustering, in Proceedings of IWPT 2009 (short paper), Paris, France
pdf
Candito M.-H., Crabbé B., Denis P. and Guérin F., 2009,
Analyse syntaxique du français : des constituants aux dépendances, Proceedings of TALN 2009, Senlis, France
pdf
Crabbé B. and Candito M.-H., 2008,
Expériences d'analyse syntaxique du français, Proceedings of TALN 2008, Avignon, France
pdf