11th International Conference on Parsing Technologies (IWPT'09)
Much recent research in natural language parsing takes as input carefully crafted, edited text, often from newspapers. However, many real-world applications involve processing text which is not written carefully by a native speaker, is produced for an eventual audience of only one, and is in essence ephemeral. In this talk I will present a number of research and commercial applications of this type which I and collaborators are developing, in which we parse text as diverse as mobile phone text messages, non-native language learner essays, internet chat, and primary care medical notes. I will discuss the problems these types of text pose for a parser, and outline how we integrate information from parsing into applications.
Nonparametric Bayesian methods are interesting because they may provide a way of learning the appropriate units of generalization (i.e., the "rules" of a grammar) as well as the generalization's probability or weight (i.e., the rule's probability). Adaptor Grammars are a framework for stating a variety of hierarchical nonparametric Bayesian models, where the units of generalization can be viewed as kinds of PCFG rules. This talk describes the mathematical and computational properties of Adaptor Grammars and linguistic applications such as word segmentation, syllabification and named entity recognition. The later part of the talk reviews MCMC inference and describes the MCMC algorithms we use to sample adaptor grammars.
Joint work with Sharon Goldwater and Tom Griffiths.
There is a strong tendency in natural language syntax such that elements that have a direct syntactic relation are also adjacent in the surface realization of a sentence. Nevertheless, notable exceptions to this generalization exist in practically all languages and are especially common in languages with free or flexible word order. Syntactic theorists, on the one hand, have developed a variety of representational devices for dealing with these exceptions, including phonetically null elements, gap threading, and non-projective dependency trees. Syntactic parsers, on the other hand, use these devices very restrictively since they add to the complexity of an already daunting task. This is especially true of data-driven parsers, where discontinuity is often simply ignored. In this talk, I will review techniques for dealing with discontinuous structures in the framework of dependency parsing, focusing on parsing algorithms that build structures from non-adjacent elements and in particular transition-based algorithms that use online reordering.