|
Statistical dependency parsing of FrenchUniversité Paris Diderot - INRIA |
Content : This page gathers various resources for the statistical dependency parsing of French. In particular, preprocessing code and learnt models/grammars for MaltParser, MSTParser, Berkeley Parser, and a constituent-to-dependencies conversion tool for French. Many thanks to Joakim Nivre, Ryan McDonald and Slav Petrov for making their parsers available. Contributors : Marie Candito (contact), Benoît Crabbé, Pascal Denis, Mathieu Falco, François Guérin, Enrique Henestroza Anguiano, Joakim Nivre, Djamé Seddah Funding : Part of this work is performed within the ANR SEQUOIA project (large coverage probabilistic syntactic parsing of French).
|
||||||||||||||||||||||||
Dependency Corpus(converted from French Treebank constituency trees) |
The French Treebank consists in approx. 20000 sentences of the Le Monde newspaper, annotated for morphology and phrase-structure (Abeillé A. and Barrier N., Enriching a French Treebank, LREC'2004). Part of the treebank also contains grammatical functions for the constituents that depend on verbs, this part growing over time. We designed an automatic procedure for converting the constituency trees with functional information into surface dependency trees. It resulted in a first version of 12531 surface dependency trees (cf. TALN'2009, LREC'2010), and here is a description of the resulting annotation scheme : French surface dependencies annotation scheme (in French). At the occasion of the SPMRL 2013 shared task on statistical parsing of morphologically rich languages (Seddah et al., 2013), we revised the conversion procedure and applied it to a larger set of constituency trees with functional information available at that time, namely 18535 sentences (see the README of the French part of the SPMRL 2013 dataset for more information). We distribute the converted dependency treebank freely provided one has the licence for the original French Treebank (see here). Once you have the licence, you may contact marie . candito @ linguist . univ-paris-diderot .fr for the dependency treebank. |
|||||||||||||||||||||||
Parsing |
|
|||||||||||||||||||||||
Publications |
Candito M.-H., Nivre J., Denis P. and Henestroza Anguiano E., 2010, Benchmarking of Statistical Dependency Parsers for French, Proceedings of COLING'2010 (poster session), Beijing, China |
pdf |
Candito M.-H., Crabbé B., and Denis P., 2010, Statistical French dependency parsing: treebank conversion and first results, Proceedings of LREC'2010, La Valletta, Malta |
pdf |
Seddah D., Candito M.-H. and Crabbé B., 2009, Cross-parser evaluation and tagset variation: a French treebank study, in Proceedings of IWPT 2009, Paris, France |
pdf |
Candito M.-H. and Crabbé B., 2009, Improving generative statistical parsing with semi-supervised word clustering, in Proceedings of IWPT 2009 (short paper), Paris, France |
pdf |
Candito M.-H., Crabbé B., Denis P. and Guérin F., 2009, Analyse syntaxique du français : des constituants aux dépendances, Proceedings of TALN 2009, Senlis, France |
pdf |
Crabbé B. and Candito M.-H., 2008, Expériences d'analyse syntaxique du français, Proceedings of TALN 2008, Avignon, France |
pdf |