An Information Theoretic Approach to Symbolic Learning in Synthetic Languages期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

按检索

An Information Theoretic Approach to Symbolic Learning in Synthetic Languages

Authors:	Andrew D Back Janet Wiles

Institution:	School of Information Technology and Electrical Engineering, The University of Queensland, Brisbane, QLD 4072, Australia;

Abstract:	An important aspect of using entropy-based models and proposed “synthetic languages”, is the seemingly simple task of knowing how to identify the probabilistic symbols. If the system has discrete features, then this task may be trivial; however, for observed analog behaviors described by continuous values, this raises the question of how we should determine such symbols. This task of symbolization extends the concept of scalar and vector quantization to consider explicit linguistic properties. Unlike previous quantization algorithms where the aim is primarily data compression and fidelity, the goal in this case is to produce a symbolic output sequence which incorporates some linguistic properties and hence is useful in forming language-based models. Hence, in this paper, we present methods for symbolization which take into account such properties in the form of probabilistic constraints. In particular, we propose new symbolization algorithms which constrain the symbols to have a Zipf–Mandelbrot–Li distribution which approximates the behavior of language elements. We introduce a novel constrained EM algorithm which is shown to effectively learn to produce symbols which approximate a Zipfian distribution. We demonstrate the efficacy of the proposed approaches on some examples using real world data in different tasks, including the translation of animal behavior into a possible human language understandable equivalent.

Keywords:	information theoretic models synthetic language entropy Zipf– Mandelbrot– Li law language models behavior prediction

设为首页 | 免责声明 | 关于勤云 | 加入收藏