Nonlinear mapping of massive data sets by fuzzy clustering and neural networks期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

Nonlinear mapping of massive data sets by fuzzy clustering and neural networks

Authors:	Dmitrii N. Rassokhin Victor S. Lobanov Dimitris K. Agrafiotis

Abstract:	Producing good low‐dimensional representations of high‐dimensional data is a common and important task in many data mining applications. Two methods that have been particularly useful in this regard are multidimensional scaling and nonlinear mapping. These methods attempt to visualize a set of objects described by means of a dissimilarity or distance matrix on a low‐dimensional display plane in a way that preserves the proximities of the objects to whatever extent is possible. Unfortunately, most known algorithms are of quadratic order, and their use has been limited to relatively small data sets. We recently demonstrated that nonlinear maps derived from a small random sample of a large data set exhibit the same structure and characteristics as that of the entire collection, and that this structure can be easily extracted by a neural network, making possible the scaling of data set orders of magnitude larger than those accessible with conventional methodologies. Here, we present a variant of this algorithm based on local learning. The method employs a fuzzy clustering methodology to partition the data space into a set of Voronoi polyhedra, and uses a separate neural network to perform the nonlinear mapping within each cell. We find that this local approach offers a number of advantages, and produces maps that are virtually indistinguishable from those derived with conventional algorithms. These advantages are discussed using examples from the fields of combinatorial chemistry and optical character recognition. © 2001 John Wiley & Sons, Inc. J Comput Chem 22: 373–386, 2001

Keywords:	data mining data analysis pattern recognition dimensionality reduction feature extraction nonlinear mapping Sammon mapping multidimensional scaling neural network fuzzy clustering combinatorial chemistry molecular similarity molecular diversity

设为首页 | 免责声明 | 关于勤云 | 加入收藏