首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 296 毫秒
1.
In this paper, we address the changing composition of a customer portfolio taking into account actions undertaken by the company to adapt its service offer to market conditions and/or technological innovations. We present a specific methodology to identify clusters of customers in different periods and then compare them over time. The classification process takes into account both qualitative and quantitative aspects of the consumption levels of the services or products offered by the company. The possibility of period‐to‐period variation in the customer portfolio and the service or product offer is also considered, in order to achieve a more realistic scenario. The core of the proposed methodology is related to the family of exploratory factorial and cluster techniques. The customers are classified by using a bicriterial clustering methodology based on ‘tandem’ analysis (multiple factor analysis+cluster analysis of the main factors). The bicriterial approach allows for a compromise between customers' consumption levels (a quantitative criterion) and their consumption/non‐consumption pattern (a qualitative criterion). The evolution of the customer portfolio composition is explored through multiple correspondence analysis. This technique allows visual comparison of the position of different clusters against time and the identification of key changes in customer consumption behavior. The methodology is tested on realistic customer portfolio scenarios for a major telecommunication company. We simulate various scenarios to show the strengths of our proposal. Copyright © 2009 John Wiley & Sons, Ltd.  相似文献   

2.
本文的目的为敏感性问题提供科学的较复杂抽样调查方法及其统计量的计算公式。使用Cochran W.G.的抽样理论、随机应答技术的Warner模型、全概率公式、方差的基本性质等理论与方法,推导出二分类敏感问题随机应答技术Warner模型在整群抽样、分层整群抽样下总体比例的估计量及其估计方差的计算公式,并在苏州大学学生婚前性行为的调查中取得了信度较高的成功应用效果。  相似文献   

3.
We propose a functional extension of fuzzy clusterwise regression, which estimates fuzzy memberships of clusters and regression coefficient functions for each cluster simultaneously. The proposed method permits dependent and/or predictor variables to be functional, varying over time, space, and other continua. The fuzzy memberships and clusterwise regression coefficient functions are estimated by minimizing an objective function that adopts a basis function expansion approach to approximating functional data. An alternating least squares algorithm is developed to minimize the objective function. We conduct simulation studies to demonstrate the superior performance of the proposed method compared to its non-functional counterpart and to examine the performance of various cluster validity measures for selecting the optimal number of clusters. We apply the proposed method to real datasets to illustrate the empirical usefulness of the proposed method.  相似文献   

4.
We sharpen run‐time analysis for algorithms under the partial rejection sampling framework. Our method yields improved bounds for: the cluster‐popping algorithm for approximating all‐terminal network reliability; the cycle‐popping algorithm for sampling rooted spanning trees; and the sink‐popping algorithm for sampling sink‐free orientations. In all three applications, our bounds are not only tight in order, but also optimal in30 constants.  相似文献   

5.
Multidimensional multivariate data have been studied in different areas for quite some time. Commonly, the analysis goal is not to look into individual records but to understand the distribution of the records at large and to find clusters of records that exhibit correlations between dimensions or variables. We propose a visualization method that operates on density rather than individual records. To not restrict our search for clusters, we compute density in the given multidimensional space. Clusters are formed by areas of high density. We present an approach that automatically computes a hierarchical tree of high density clusters. For visualization purposes, we propose a method to project the multidimensional clusters to a 2D or 3D layout. The projection method uses an optimized star coordinates layout. The optimization procedure minimizes the overlap of projected clusters and maximally maintains the cluster shapes, compactness, and distribution. The star coordinate visualization allows for an interactive analysis of the distribution of clusters and comprehension of the relations between clusters and the original dimensions. Clusters are being visualized using nested sequences of density level sets leading to a quantitative understanding of information content, patterns, and relationships.  相似文献   

6.
We propose a new spatial scan statistic based on graph theory as a method for detecting irregularly-shaped clusters of events over space. A graph-based method is proposed for identifying potential clusters in spatial point processes. It relies on linking the events closest than a given distance and thus defining a graph associated to the point process. The set of possible clusters is then restricted to windows including the connected components of the graph. The concentration in each of these possible clusters is measured through classical concentration indices based on likelihood ratio and also through a new concentration index which does not depend on any alternative hypothesis. These graph-based spatial scan tests seem to be very powerful against any arbitrarily-shaped cluster alternative, whatever the dimension of the data. These results have applications in various fields, such as the epidemiological study of rare diseases or the analysis of astrophysical data.  相似文献   

7.
以系统动力学为基础,研究了产业集群对企业绩效的影响机制。分别建立了外部资源环境、发展规模及科研创新能力对企业绩效的影响系统因果图,在此基础上开展了产业集群对企业绩效影响的全过程动力分析。以上海市医药制造产业集群为例开展了实证研究,使用Vensim模拟软件进行了模拟仿真,详尽探讨了影响机制中正负反馈回路。结果表明,产业集群的外部资源环境、产业集群的发展规模和产业集群的科研创新能力对集群内企业的发展有着正向影响,政府和集群内企业可以通过加大对集群科研实力的投资力度、改善产业集群外部环境、扩大产业集群发展规模三个方面来提高产业集群内的企业绩效,促进产业集群内企业的可持续性发展。  相似文献   

8.
9.
In July of 1987, the Sampling Survey of Children's Situation was conducted in 9 provinces/autonomous regions of China. A stratified two-stage cluster sampling plan was designed for the survey. The paper presents the methods of stratification, selectingn=2 PSU's (cities/counties) with unequal probabilities without replacement in each stratum and selecting residents/village committees in each sampled city/county. All formulae of estimating population characteristics (especially population totals and the ratios of two totals), and estimating variances of those estimators are given. Finally, we analyse the precision of the survey preliminarily from the results of data processing.Supported partially by the National Funds of Natural Sciences, 7860013.  相似文献   

10.
Spatial scan statistics are commonly used for geographic disease cluster detection and evaluation. We propose and implement a modified version of the simulated annealing spatial scan statistic that incorporates the concept of “non-compactness” in order to penalize clusters that are very irregular in shape. We evaluate its power for the simulated annealing scan and compare it with the circular and elliptic spatial scan statistics. We observe that, with the non-compactness penalty, the simulated annealing method is competitive with the circular and elliptic scan statistic, and both have good power performance. The elliptic scan statistic is computationally faster and is well suited for mildly irregular clusters, but the simulated annealing method deals better with highly irregular cluster shapes. The new method is applied to breast cancer mortality data from northeastern United States.  相似文献   

11.
NP-completeness of two clustering (partition) problems is proved for a finite sequence of Euclidean vectors. In the optimization versions of both problems it is required to partition the elements of the sequence into a fixed number of clusters minimizing the sum of squares of the distances from the cluster elements to their centers. In the first problem the sizes of clusters are the part of input, while in the second they are unknown (they are the variables for optimization). Except for the center of one (special) cluster, the center of each cluster is the mean value of all vectors contained in it. The center of the special cluster is zero. Also, the partition must satisfy the following condition: The difference between the indices of two consecutive vectors in every nonspecial cluster is bounded below and above by two given constants.  相似文献   

12.
Global warming and the associated climate changes are being the subject of intensive research due to their major impact on social, economic and health aspects of the human life. Surface temperature time-series characterise Earth as a slow dynamics spatiotemporal system, evidencing long memory behaviour, typical of fractional order systems. Such phenomena are difficult to model and analyse, demanding for alternative approaches. This paper studies the complex correlations between global temperature time-series using the Multidimensional scaling (MDS) approach. MDS provides a graphical representation of the pattern of climatic similarities between regions around the globe. The similarities are quantified through two mathematical indices that correlate the monthly average temperatures observed in meteorological stations, over a given period of time. Furthermore, time dynamics is analysed by performing the MDS analysis over slices sampling the time series. MDS generates maps describing the stations’ locus in the perspective that, if they are perceived to be similar to each other, then they are placed on the map forming clusters. We show that MDS provides an intuitive and useful visual representation of the complex relationships that are present among temperature time-series, which are not perceived on traditional geographic maps. Moreover, MDS avoids sensitivity to the irregular distribution density of the meteorological stations.  相似文献   

13.
Summary  In hierarchical cluster analysis dendrogram graphs are used to visualize how clusters are formed. Because each observation is displayed dendrograms are impractical when the data set is large. For non-hierarchical cluster algorithms (e.g. Kmeans) a graph like the dendrogram does not exist. This paper discusses a graph named “clustergram” to examine how cluster members are assigned to clusters as the number of clusters increases. The clustergram can also give insight into algorithms. For example, it can easily be seen that the “single linkage” algorithm tends to form clusters that consist of just one observation. It is also useful in distinguishing between random and deterministic implementations of the Kmeans algorithm. A data set related to asbestos claims and the Thailand Landmine Data are used throughout to illustrate the clustergram.  相似文献   

14.
Local polynomial reproduction and moving least squares approximation   总被引:5,自引:0,他引:5  
Local polynomial reproduction is a key ingredient in providingerror estimates for several approximation methods. To boundthe Lebesgue constants is a hard task especially in a multivariatesetting. We provide a result which allows us to bound the Lebesgueconstants uniformly and independently of the space dimensionby oversampling. We get explicit and small bounds for the Lebesgueconstants. Moreover, we use these results to establish errorestimates for the moving least squares approximation scheme,also with special emphasis on the involved constants. We discussthe numerical treatment of the method and analyse its effort.Finally, we give large scale examples.  相似文献   

15.
The use of a finite mixture of normal distributions in model-based clustering allows us to capture non-Gaussian data clusters. However, identifying the clusters from the normal components is challenging and in general either achieved by imposing constraints on the model or by using post-processing procedures. Within the Bayesian framework, we propose a different approach based on sparse finite mixtures to achieve identifiability. We specify a hierarchical prior, where the hyperparameters are carefully selected such that they are reflective of the cluster structure aimed at. In addition, this prior allows us to estimate the model using standard MCMC sampling methods. In combination with a post-processing approach which resolves the label switching issue and results in an identified model, our approach allows us to simultaneously (1) determine the number of clusters, (2) flexibly approximate the cluster distributions in a semiparametric way using finite mixtures of normals and (3) identify cluster-specific parameters and classify observations. The proposed approach is illustrated in two simulation studies and on benchmark datasets. Supplementary materials for this article are available online.  相似文献   

16.
We propose a new technique to perform unsupervised data classification (clustering) based on density induced metric and non-smooth optimization. Our goal is to automatically recognize multidimensional clusters of non-convex shape. We present a modification of the fuzzy c-means algorithm, which uses the data induced metric, defined with the help of Delaunay triangulation. We detail computation of the distances in such a metric using graph algorithms. To find optimal positions of cluster prototypes we employ the discrete gradient method of non-smooth optimization. The new clustering method is capable to identify non-convex overlapped d-dimensional clusters.  相似文献   

17.
In spatial studies, use is commonly made of nested sampling plans. By applying such plans, one takes observations according to a hierarchical scheme, with decreasing distances between observations. As observed by Miesch (1975), the cumulative sum of variance components provided by the nested sampling plan may be used in some situations to obtain semivariogram values. In this article, proofs are given for both balanced and unbalanced designs. Different estimation procedures for obtaining semivariogram values are compared with each other. The paper is illustrated with two numerical examples, one on actual soil pH data and one on simulated random fields. Mean squared pair differences are shown to be inferior to expected mean squares and restricted maximum likelihood for variance component estimation and several other spatial sampling plans may be superior to the nested sampling plan for estimating the spatial semivariogram.  相似文献   

18.
The goal of clustering is to detect the presence of distinct groups in a dataset and assign group labels to the observations. Nonparametric clustering is based on the premise that the observations may be regarded as a sample from some underlying density in feature space and that groups correspond to modes of this density. The goal then is to find the modes and assign each observation to the domain of attraction of a mode. The modal structure of a density is summarized by its cluster tree; modes of the density correspond to leaves of the cluster tree. Estimating the cluster tree is the primary goal of nonparametric cluster analysis. We adopt a plug-in approach to cluster tree estimation: estimate the cluster tree of the feature density by the cluster tree of a density estimate. For some density estimates the cluster tree can be computed exactly; for others we have to be content with an approximation. We present a graph-based method that can approximate the cluster tree of any density estimate. Density estimates tend to have spurious modes caused by sampling variability, leading to spurious branches in the graph cluster tree. We propose excess mass as a measure for the size of a branch, reflecting the height of the corresponding peak of the density above the surrounding valley floor as well as its spatial extent. Excess mass can be used as a guide for pruning the graph cluster tree. We point out mathematical and algorithmic connections to single linkage clustering and illustrate our approach on several examples. Supplemental materials for the article, including an R package implementing generalized single linkage clustering, all datasets used in the examples, and R code producing the figures and numerical results, are available online.  相似文献   

19.
In this work, we assess the suitability of cluster analysis for the gene grouping problem confronted with microarray data. Gene clustering is the exercise of grouping genes based on attributes, which are generally the expression levels over a number of conditions or subpopulations. The hope is that similarity with respect to expression is often indicative of similarity with respect to much more fundamental and elusive qualities, such as function. By formally defining the true gene-specific attributes as parameters, such as expected expression across the conditions, we obtain a well-defined gene clustering parameter of interest, which greatly facilitates the statistical treatment of gene clustering. We point out that genome-wide collections of expression trajectories often lack natural clustering structure, prior to ad hoc gene filtering. The gene filters in common use induce a certain circularity to most gene cluster analyses: genes are points in the attribute space, a filter is applied to depopulate certain areas of the space, and then clusters are sought (and often found!) in the “cleaned” attribute space. As a result, statistical investigations of cluster number and clustering strength are just as much a study of the stringency and nature of the filter as they are of any biological gene clusters. In the absence of natural clusters, gene clustering may still be a worthwhile exercise in data segmentation. In this context, partitions can be fruitfully encoded in adjacency matrices and the sampling distribution of such matrices can be studied with a variety of bootstrapping techniques.  相似文献   

20.
We consider a strongly NP-hard problem of partitioning a finite sequence of vectors in Euclidean space into two clusters using the criterion of minimum sum-of-squares of distances from the elements of clusters to their centers. We assume that the cardinalities of the clusters are fixed. The center of one cluster has to be optimized and is defined as the average value over all vectors in this cluster. The center of the other cluster lies at the origin. The partition satisfies the condition: the difference of the indices of the next and previous vectors in the first cluster is bounded above and below by two given constants. We propose a 2-approximation polynomial algorithm to solve this problem.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号