Candito M.-H., Nivre J., Denis P. and Henestroza Anguiano E., 2010,
Benchmarking of Statistical Dependency Parsers for French, proceedings of COLING'2010 (poster session), Beijing, China
Parsing with MaltParser
- MaltParser version 1.3.1, developed by Johan Hall, Jens Nilsson and Joakim Nivre at Växjö University and Uppsala University, Sweden. (Note it won't work with later malt versions)
- the MElt tagger (downloadable here), developed by Pascal Denis & Benoît Sagot, (Coupling an annotated corpus and a morphosyntactic lexicon for state-of-the-art POS tagging with less human effort. In Proceedings of PACLIC 2009, Hong Kong, China).
- download and unzip the BONSAI v3.2 archive, to get preprocessing code, and malt model and settings (best Malt model according to benchmark : uses predicted POS, predicted lemmas, predicted morpho features, and unsupervised word clusters). Note preprocessing code requires:
- perl and python >2.5
- python-cjson, to install with : python setup.py install
- Set the MALT_DIR variable to your local path to Malt 1.3.1
- Set the BONSAI variable to your local path to BONSAI v3.2
The following command will preprocess and parse a raw UTF-8 text file INFILE into INFILE.outmalt :
$BONSAI/bin/bonsai_malt_parse_rawtext.sh [-n] INFILE
Use -n option if your text is already tokenized
Note : The output format is (almost) CoNLL : the 10 usual CoNLL columns, plus an extra column for word cluster ids (between the 6th and 7th usual CoNLL columns).
Note : newlines in input text are systematically interpreted as sentence frontiers.