START Conference Manager    

Smoothing fine-grained PCFG lexicons

Tejaswini Deoskar, Mats Rooth and Khalil Sima'an

11th International Conference on Parsing Technology (IWPT 2009)
Paris, France, 7th-9th October, 2009


Summary

We present an approach for smoothing treebank-PCFG lexicons with lexical information obtained from a large unannotated corpus, by interpolation of treebank lexical parameters with estimates obtained from unannotated data via the inside-outside algorithm. The PCFG has complex lexical categories, making relative-frequency estimates from a treebank very sparse. This kind of smoothing for complex lexical categories results in improved parsing performance, with a particular advantage in identifying obligatory arguments subcategorized by verbs unseen in the treebank.


START Conference Manager (V2.56.8 - Rev. 780)