首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 817 毫秒
1.
The author addresses two previously unresolved issues in maximum likelihood estimation (MLE) for multidimensional scaling (MDS). First, a theoretically consistent error model for nonmetric MLDMS is proposed. In particular, theoretical arguments are given that the disturbance should be multiplicative with distance when a stochastic choice model is used on rank-ordered similarity data. This assumption implies that the systematic component of similarity in the rank order is a logarithmic function of distances between stimuli. Second, a problem with identification condition of the maximum likelihood estimators is raised. The author provides a set of constraints that guarantees the identification in MLE, and produces more desirable asymptotic confidence regions that are parameter independent. An example using perception of business schools illustrates these ideas and demonstrates the computational tractability of the MLE approach to MDS.  相似文献   

2.
数据群点的主轴表示该数据群点分布变异最大的若干方向,它是反映数据群点分布构造的主要特征之一。本论文提出一种对高维空间里数据群点主轴旋转运动的预测建模方法,并将其用于中国城市经济发展预测分析,可用于推测城市群体经济发展的主要特征方向  相似文献   

3.
We discuss methodology for multidimensional scaling (MDS) and its implementation in two software systems, GGvis and XGvis. MDS is a visualization technique for proximity data, that is, data in the form of N × N dissimilarity matrices. MDS constructs maps (“configurations,” “embeddings”) in IRk by interpreting the dissimilarities as distances. Two frequent sources of dissimilarities are high-dimensional data and graphs. When the dissimilarities are distances between high-dimensional objects, MDS acts as a (often nonlinear) dimension-reduction technique. When the dissimilarities are shortest-path distances in a graph, MDS acts as a graph layout technique. MDS has found recent attention in machine learning motivated by image databases (“Isomap”). MDS is also of interest in view of the popularity of “kernelizing” approaches inspired by Support Vector Machines (SVMs; “kernel PCA”).

This article discusses the following general topics: (1) the stability and multiplicity of MDS solutions; (2) the analysis of structure within and between subsets of objects with missing value schemes in dissimilarity matrices; (3) gradient descent for optimizing general MDS loss functions (“Strain” and “Stress”); (4) a unification of classical (Strain-based) and distance (Stress-based) MDS.

Particular topics include the following: (1) blending of automatic optimization with interactive displacement of configuration points to assist in the search for global optima; (2) forming groups of objects with interactive brushing to create patterned missing values in MDS loss functions; (3) optimizing MDS loss functions for large numbers of objects relative to a small set of anchor points (“external unfolding”); and (4) a non-metric version of classical MDS.

We show applications to the mapping of computer usage data, to the dimension reduction of marketing segmentation data, to the layout of mathematical graphs and social networks, and finally to the spatial reconstruction of molecules.  相似文献   

4.
5.
6.
Visual data mining is an efficient way to involve human in search for a optimal decision. This paper focuses on the optimization of the visual presentation of multidimensional data.A variety of methods for projection of multidimensional data on the plane have been developed. At present, a tendency of their joint use is observed. In this paper, two consequent combinations of the self-organizing map (SOM) with two other well-known nonlinear projection methods are examined theoretically and experimentally. These two methods are: Sammon’s mapping and multidimensional scaling (MDS). The investigations showed that the combinations (SOM_Sammon and SOM_MDS) have a similar efficiency. This grounds the possibility of application of the MDS with the SOM, because up to now in most researches SOM is applied together with Sammon’s mapping. The problems on the quality and accuracy of such combined visualization are discussed. Three criteria of different nature are selected for evaluation the efficiency of the combined mapping. The joint use of these criteria allows us to choose the best visualization result from some possible ones.Several different initialization ways for nonlinear mapping are examined, and a new one is suggested. A new approach to the SOM visualization is suggested.The obtained results allow us to make better decisions in optimizing the data visualization.  相似文献   

7.
8.
Fitting semiparametric clustering models to dissimilarity data   总被引:1,自引:0,他引:1  
The cluster analysis problem of partitioning a set of objects from dissimilarity data is here handled with the statistical model-based approach of fitting the “closest” classification matrix to the observed dissimilarities. A classification matrix represents a clustering structure expressed in terms of dissimilarities. In cluster analysis there is a lack of methodologies widely used to directly partition a set of objects from dissimilarity data. In real applications, a hierarchical clustering algorithm is applied on dissimilarities and subsequently a partition is chosen by visual inspection of the dendrogram. Alternatively, a “tandem analysis” is used by first applying a Multidimensional Scaling (MDS) algorithm and then by using a partitioning algorithm such as k-means applied on the dimensions specified by the MDS. However, neither the hierarchical clustering algorithms nor the tandem analysis is specifically defined to solve the statistical problem of fitting the closest partition to the observed dissimilarities. This lack of appropriate methodologies motivates this paper, in particular, the introduction and the study of three new object partitioning models for dissimilarity data, their estimation via least-squares and the introduction of three new fast algorithms.  相似文献   

9.
This paper studies the behavior of teams competing within soccer national leagues. The dissimilarities between teams are measured using the match results at each round and that information feeds a multidimensional scaling (MDS) algorithm for visualizing teams’ performance. Data characterizing four European leagues during season 2014–2015 is adopted and processed using three distinct approaches. In the first, one dissimilarity matrix and one MDS map per round are generated. After, Procrustes analysis is applied to linearly transform the MDS charts for maximum superposition and to build one global MDS representation for the whole season. In the second approach, all data is combined into one dissimilarity matrix leading to a single global MDS chart. In the third approach, the results per round are used to generate time series for all teams. Then, the time series are compared, generating a dissimilarity matrix and the corresponding MDS map. In all cases, the points on the maps represent teams state up to a given round. The set of points corresponding to each team forms a locus representative of its performance versus time.  相似文献   

10.
In this paper we develop a class of applied probabilistic continuous time but discretized state space decompositions of the characterization of a multivariate generalized diffusion process. This decomposition is novel and, in particular, it allows one to construct families of mimicking classes of processes for such continuous state and continuous time diffusions in the form of a discrete state space but continuous time Markov chain representation. Furthermore, we present this novel decomposition and study its discretization properties from several perspectives. This class of decomposition both brings insight into understanding locally in the state space the induced dependence structures from the generalized diffusion process as well as admitting computationally efficient representations in order to evaluate functionals of generalized multivariate diffusion processes, which is based on a simple rank one tensor approximation of the exact representation. In particular, we investigate aspects of semimartingale decompositions, approximation and the martingale representation for multidimensional correlated Markov processes. A new interpretation of the dependence among processes is given using the martingale approach. We show that it is possible to represent, in both continuous and discrete space, that a multidimensional correlated generalized diffusion is a linear combination of processes originated from the decomposition of the starting multidimensional semimartingale. This result not only reconciles with the existing theory of diffusion approximations and decompositions, but defines the general representation of infinitesimal generators for both multidimensional generalized diffusions and, as we will demonstrate, also for the specification of copula density dependence structures. This new result provides immediate representation of the approximate weak solution for correlated stochastic differential equations. Finally, we demonstrate desirable convergence results for the proposed multidimensional semimartingales decomposition approximations.  相似文献   

11.
In this paper we compute the maxisets of some denoising methods (estimators) for multidimensional signals based on thresholding coefficients in hyperbolic wavelet bases. That is, we determine the largest functional space over which the risk of these estimators converges at a chosen rate. In the unidimensional setting, refining the choice of the coefficients that are subject to thresholding by pooling information from geometric structures in the coefficient domain (e.g., vertical blocks) is known to provide ‘large maxisets’. In the multidimensional setting, the situation is less straightforward. In a sense these estimators are much more exposed to the curse of dimensionality. However we identify cases where information pooling has a clear benefit. In particular, we identify some general structural constraints that can be related to compound functional models and to a minimal level of anisotropy.  相似文献   

12.
The multiple-choice multidimensional knapsack problem (MMKP) is a well-known NP-hard combinatorial optimization problem with a number of important applications. In this paper, we present a “reduce and solve” heuristic approach which combines problem reduction techniques with an Integer Linear Programming (ILP) solver (CPLEX). The key ingredient of the proposed approach is a set of group fixing and variable fixing rules. These fixing rules rely mainly on information from the linear relaxation of the given problem and aim to generate reduced critical subproblem to be solved by the ILP solver. Additional strategies are used to explore the space of the reduced problems. Extensive experimental studies over two sets of 37 MMKP benchmark instances in the literature show that our approach competes favorably with the most recent state-of-the-art algorithms. In particular, for the set of 27 conventional benchmarks, the proposed approach finds an improved best lower bound for 11 instances and as a by-product improves all the previous best upper bounds. For the 10 additional instances with irregular structures, the method improves 7 best known results.  相似文献   

13.
New formulations of data envelopment analysis (DEA) and free disposal hull (FDH) models in a unified linear framework are proposed. One of the main objectives of this paper is to derive meaningful economic interpretations of the dual models in the price space. In particular, we introduce a new formulation of the returns to scale assumption with a straightforward economic interpretation. This framework allows for mixing DEA and FDH models together in a more general framework.  相似文献   

14.
It is shown that finding the equivalence set for solving multiobjective discrete optimization problems is advantageous over finding the set of Pareto optimal decisions. An example of a set of key parameters characterizing the economic efficiency of a commercial firm is proposed, and a mathematical model of its activities is constructed. In contrast to the classical problem of finding the maximum profit for any business, this study deals with a multiobjective optimization problem. A method for solving inverse multiobjective problems in a multidimensional pseudometric space is proposed for finding the best project of firm’s activities. The solution of a particular problem of this type is presented.  相似文献   

15.
The weak and strong topologies on the space of orbits from the unit interval to the set of probability measures are considered. A particular interest is periodic orbits of probability measures on the circle. It is shown that a realvalued rotation number can be defined in a natural way for all smooth enough orbits whose range consists of probability measures supported on the whole circle. Furthermore, this number is a continuous functional with respect to an appropriately defined strong topology. The completion of this space contains as a special case deterministic orbits, whose rotation number is an integer, coinciding with the topological degree.  相似文献   

16.
A data analysis method is proposed to derive a latent structure matrix from a sample covariance matrix. The matrix can be used to explore the linear latent effect between two sets of observed variables. Procedures with which to estimate a set of dependent variables from a set of explanatory variables by using latent structure matrix are also proposed. The proposed method can assist the researchers in improving the effectiveness of the SEM models by exploring the latent structure between two sets of variables. In addition, a structure residual matrix can also be derived as a by-product of the proposed method, with which researchers can conduct experimental procedures for variables combinations and selections to build various models for hypotheses testing. These capabilities of data analysis method can improve the effectiveness of traditional SEM methods in data property characterization and models hypotheses testing. Case studies are provided to demonstrate the procedure of deriving latent structure matrix step by step, and the latent structure estimation results are quite close to the results of PLS regression. A structure coefficient index is suggested to explore the relationships among various combinations of variables and their effects on the variance of the latent structure.  相似文献   

17.
To facilitate applications in general insurance, some extensions are proposed to cluster-weighted models (CWMs). First, we extend CWMs to have generalized cluster-weighted models (GCWMs) by allowing modeling of non-Gaussian distribution of the continuous covariates, as they frequently occur in insurance practice. Secondly, we introduce a zero-inflated extension of GCWM (ZI-GCWM) for modeling insurance claims data with excess zeros coming from heterogeneous sources. Additionally, we give two expectation–optimization (EM) algorithms for parameter estimation given in the proposed models. An appropriate simulation study shows that, for various settings and in contrast to the existing mixture-based approaches, both extended models perform well. Finally, a real data set based on French auto-mobile policies is used to illustrate the application of the proposed extensions.  相似文献   

18.
Scagnostics is a Tukey neologism for the term scatterplot diagnostics. Scagnostics are characterizations of the 2D distributions of orthogonal pairwise projections of a set of points in multidimensional Euclidean space. These characterizations include such measures as density, skewness, shape, outliers, and texture. We introduce a set of scagnostics measures based on graph theory and we analyze their distributions and performance. Our analysis is based on a restrictive set of criteria that must be met in order to have scagnostics measures that can be used effectively in exploratory data analysis.  相似文献   

19.
An efficient method for solving parabolic systems is presented. The proposed method is based on the splitting-up principle in which the problem is reduced to a series of independent 1D problems. This enables it to be used with parallel processors. We can solve multidimensional problems by applying only the 1D method and consequently avoid the difficulties in constructing a finite element space for multidimensional problems. The method is suitable for general domains as well as rectangular domains. Every 1D subproblem is solved by applying cubic B-splines. Several numerical examples are presented.  相似文献   

20.
Some generalizations of the notion of univariate data interpolationare presented, inducing the concept of set-valued interpolationin a general metric space. Consequently methods for univariateinterpolation or smoothing of multidimensional geometrical dataare suggested. In particular the application of these methodsto 3-D body recognition from cross-sectional data is discussed.Preliminary analysis of the interpolation process is presentedand the capability of reconstructing bodies of complex topologiesis exemplified.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号