ALPaGE

Parsing French using the MaltParser

home: Statistical dependency parsing of French

Université Paris Diderot - INRIA

Publications

Candito M.-H., Nivre J., Denis P. and Henestroza Anguiano E., 2010,
Benchmarking of Statistical Dependency Parsers for French, proceedings of COLING'2010 (poster session), Beijing, China
pdf

Parsing with MaltParser

Prerequisites

  • MaltParser version 1.3.1, developed by Johan Hall, Jens Nilsson and Joakim Nivre at Växjö University and Uppsala University, Sweden. (Note it won't work with later malt versions)

  • the MElt tagger (downloadable here), developed by Pascal Denis & Benoît Sagot, (Coupling an annotated corpus and a morphosyntactic lexicon for state-of-the-art POS tagging with less human effort. In Proceedings of PACLIC 2009, Hong Kong, China).

  • download and unzip the BONSAI v3.2 archive, to get preprocessing code, and malt model and settings (best Malt model according to benchmark : uses predicted POS, predicted lemmas, predicted morpho features, and unsupervised word clusters). Note preprocessing code requires:
    • perl and python >2.5
    • python-cjson, to install with : python setup.py install

  • Set the MALT_DIR variable to your local path to Malt 1.3.1

  • Set the BONSAI variable to your local path to BONSAI v3.2

Parsing command

The following command will preprocess and parse a raw UTF-8 text file INFILE into INFILE.outmalt :
$BONSAI/bin/bonsai_malt_parse_rawtext.sh [-n] INFILE

Use -n option if your text is already tokenized

Note : The output format is (almost) CoNLL : the 10 usual CoNLL columns, plus an extra column for word cluster ids (between the 6th and 7th usual CoNLL columns).
Note : newlines in input text are systematically interpreted as sentence frontiers.