START Conference Manager    

The chunk as the period of the functions length and frequency of words on the syntagmatic axis

Jacques Vergne

11th International Conference on Parsing Technology (IWPT 2009)
Paris, France, 7th-9th October, 2009


Summary

Chunking is segmenting a text into chunks, sub-sentential segments, that Abney ap-proximately defined as stress groups. Chunking usually uses monolingual resources, most often exhaustive, sometimes partial : function words and punctuations, which often mark beginnings and ends of chunks. But, to extend this method to other languages, monolingual resources have to be multiplied. We present a new method : endogenous chunking, which uses no other resource than the text to be segmented itself. The idea of this method comes from Zipf : to make the least communication effort, speakers are driven to shorten frequent words. A chunk then can be characterized as the period of the periodic correlated functions length and frequency of words on the syntagmatic axis. This original method takes its advantage to be applied to a great number of languages of alphabetic script, with the same algorithm, without any resource.


START Conference Manager (V2.56.8 - Rev. 780)