|
Parsing French using the Berkeley parserhome: Statistical dependency parsing of FrenchUniversité Paris Diderot - INRIA |
Parsing with Berkeley Parser |
The following code makes use of the Berkeley Parser v1.0, slightly adapted to French (for unknown words suffixes, using Abishek Arun's heuristics). Many thanks to Slav Petrov for making his code available.
Prerequisites
Parsing commandThe following command will preprocess and parse a raw UTF-8 text file INFILE and print output to STDOUT
Use -h option for online help
The parsing corresponds to the best Berkeley configuration described in the benchmark (coling 2010 poster) : it segments and tokenizes text, replaces tokens by clusters, parses, and reinserts original tokens. Coming soon... : improved functional role labeler | |
Publications |
Candito M.-H., Nivre J., Denis P. and Henestroza Anguiano E., 2010, Benchmarking of Statistical Dependency Parsers for French, Proceedings of COLING'2010 (poster session), Beijing, China |
pdf |
Candito M.-H., Crabbé B., and Denis P., 2010, Statistical French dependency parsing: treebank conversion and first results, Proceedings of LREC'2010, La Valletta, Malta |
pdf |
Seddah D., Candito M.-H. and Crabbé B., 2009, Cross-parser evaluation and tagset variation: a French treebank study, in Proceedings of IWPT 2009, Paris, France |
pdf |
Candito M.-H. and Crabbé B., 2009, Improving generative statistical parsing with semi-supervised word clustering, in Proceedings of IWPT 2009 (short paper), Paris, France |
pdf |
Candito M.-H., Crabbé B., Denis P. and Guérin F., 2009, Analyse syntaxique du français : des constituants aux dépendances, Proceedings of TALN 2009, Senlis, France |
pdf |
Crabbé B. and Candito M.-H., 2008, Expériences d'analyse syntaxique du français, Proceedings of TALN 2008, Avignon, France |
pdf |