首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Many clustering methods, such as K -means, kernel K -means, and MNcut clustering, follow the same recipe: (i) choose a measure of similarity between observations; (ii) define a figure of merit assigning a large value to partitions of the data that put similar observations in the same cluster; and (iii) optimize this figure of merit over partitions. Potts model clustering represents an interesting variation on this recipe. Blatt, Wiseman, and Domany defined a new figure of merit for partitions that is formally similar to the Hamiltonian of the Potts model for ferromagnetism, extensively studied in statistical physics. For each temperature T, the Hamiltonian defines a distribution assigning a probability to each possible configuration of the physical system or, in the language of clustering, to each partition. Instead of searching for a single partition optimizing the Hamiltonian, they sampled a large number of partitions from this distribution for a range of temperatures. They proposed a heuristic for choosing an appropriate temperature and from the sample of partitions associated with this chosen temperature, they then derived what we call a consensus clustering: two observations are put in the same consensus cluster if they belong to the same cluster in the majority of the random partitions. In a sense, the consensus clustering is an “average” of plausible configurations, and we would expect it to be more stable (over different samples)than the configuration optimizing the Hamiltonian.

The goal of this article is to contribute to the understanding of Potts model clustering and to propose extensions and improvements: (1) We show that the Hamiltonian used in Potts model clustering is closely related to the kernel K -means and MNCutcriteria. (2) We propose a modification of the Hamiltonian penalizing unequal clustersizes and show that it can be interpreted as a weighted version of the kernel K -meanscriterion. (3) We introduce a new version of the Wolff algorithm to simulate configurations from the distribution defined by the penalized Hamiltonian, leading to penalized Potts model clustering. (4) We note a link between kernel based clustering methods and nonparametric density estimation and exploit it to automatically determine locally adaptive kernel bandwidths. (5) We propose a new simple rule for selecting a good temperature T.

As an illustration we apply Potts model clustering to gene expression data and compare our results to those obtained by model based clustering and a nonparametric dendrogram sharpening method.  相似文献   

2.
在评价中经常会出现评价者对某些关键评价指标有特定要求或评价结果的预期,数学分析表明由这些要求或预期提出的约束蕴含着该指标的临界权重,且可以通过最优化方法求解临界权重并据此进行约束式赋权.在评价具体对象时,评价指标得分的分布情况会影响评价结果,由此提出根据全部评价指标的标准偏差和最低得分指标的离差对权重进行调整的方法.  相似文献   

3.
Summary Methods are described for the determination of the number of ellipsoidal particles (rotatory ellipsoids which all have the same axial ratio), their mean axis, their standard deviation and their mean volume from measurements made on random plane sections. Confidence intervals are given for the estimation of the axial ratio of the particles.

33 Braunschweig, Oppelnstrasse 28/VI.  相似文献   

4.
ABSTRACT

Combining a standard measure of concern about low relative wealth and a standard measure of relative risk aversion leads to a novel explanation of variation in risk-taking behavior identified and documented by social psychologists and economists. We obtain two results: (1) Holding individual i’s wealth and his rank in the wealth distribution constant, the individual’s relative risk aversion decreases when he becomes more relatively deprived as a result of an increase in the average wealth of the individuals who are wealthier than he is. (2) If relative deprivation enters the individual’s utility function approximately linearly then, holding constant individual i’s wealth and the average wealth of the individuals who are wealthier than he is, the individual’s relative risk aversion decreases when he becomes more relatively deprived as a result of a decline in his rank. Our findings provide a theoretical support for evidence about the propensity of relatively deprived individuals to gamble and resort to other risky behaviors.  相似文献   

5.

The Nemhauser–Trotter theorem states that the standard linear programming (LP) formulation for the stable set problem has a remarkable property, also known as (weak) persistency: for every optimal LP solution that assigns integer values to some variables, there exists an optimal integer solution in which these variables retain the same values. While the standard LP is defined by only non-negativity and edge constraints, a variety of other LP formulations have been studied and one may wonder whether any of them has this property as well. We show that any other formulation that satisfies mild conditions cannot have the persistency property on all graphs, unless it is always equal to the stable set polytope.

  相似文献   

6.
Infinite time register machines (ITRMs) are register machines which act on natural numbers and which are allowed to run for arbitrarily many ordinal steps. Successor steps are determined by standard register machine commands. At limit times register contents are defined by appropriate limit operations. In this paper, we examine the ITRMs introduced by the third and fourth author (Koepke and Miller in Logic and Theory of Algorithms LNCS, pp. 306–315, 2008), where a register content at a limit time is set to the lim inf of previous register contents if that limit is finite; otherwise the register is reset to 0. The theory of these machines has several similarities to the infinite time Turing machines (ITTMs) of Hamkins and Lewis. The machines can decide all ${\Pi^1_1}Infinite time register machines (ITRMs) are register machines which act on natural numbers and which are allowed to run for arbitrarily many ordinal steps. Successor steps are determined by standard register machine commands. At limit times register contents are defined by appropriate limit operations. In this paper, we examine the ITRMs introduced by the third and fourth author (Koepke and Miller in Logic and Theory of Algorithms LNCS, pp. 306–315, 2008), where a register content at a limit time is set to the lim inf of previous register contents if that limit is finite; otherwise the register is reset to 0. The theory of these machines has several similarities to the infinite time Turing machines (ITTMs) of Hamkins and Lewis. The machines can decide all P11{\Pi^1_1} sets, yet are strictly weaker than ITTMs. As in the ITTM situation, we introduce a notion of ITRM-clockable ordinals corresponding to the running times of computations. These form a transitive initial segment of the ordinals. Furthermore we prove a Lost Melody theorem: there is a real r such that there is a program P that halts on the empty input for all oracle contents and outputs 1 iff the oracle number is r, but no program can decide for every natural number n whether or not n ? r{n \in r} with the empty oracle. In an earlier paper, the third author considered another type of machines where registers were not reset at infinite lim inf’s and he called them infinite time register machines. Because the resetting machines correspond much better to ITTMs we hold that in future the resetting register machines should be called ITRMs.  相似文献   

7.
A Dual-Objective Evolutionary Algorithm for Rules Extraction in Data Mining   总被引:1,自引:0,他引:1  
This paper presents a dual-objective evolutionary algorithm (DOEA) for extracting multiple decision rule lists in data mining, which aims at satisfying the classification criteria of high accuracy and ease of user comprehension. Unlike existing approaches, the algorithm incorporates the concept of Pareto dominance to evolve a set of non-dominated decision rule lists each having different classification accuracy and number of rules over a specified range. The classification results of DOEA are analyzed and compared with existing rule-based and non-rule based classifiers based upon 8 test problems obtained from UCI Machine Learning Repository. It is shown that the DOEA produces comprehensible rules with competitive classification accuracy as compared to many methods in literature. Results obtained from box plots and t-tests further examine its invariance to random partition of datasets. An erratum to this article is available at .  相似文献   

8.
The aim of this study is to determine how the TIMSS mathematics success of the 8th grade students differentiates according to the school type, gender, mathematics report mark, parents' education level, cognitive domains and cognitive domains by gender. Relational survey method was used in the study. Six-hundred fifty two 8th grade students studying in the same city in Turkey participated in this study. In this study, a 45 question test that was made up by choosing TIMSS 2011 mathematics questionnaire was used as a data collection tool. Quantitative data analysis methods were used in the data analysis, frequency, percentage, average, standard deviation, independent sample test, one-way analysis of variance and post-hoc tests were applied to data by using SPSS packaged software. At the end of the study, it was determined that the school type, mathematics school mark, parents' education level and cognitive domains influenced the students' TIMSS mathematics success but their gender was a neutral element. Moreover, it was seen that schools which are really successful in national exams are more successful in TIMSS exam; students whose mathematics school marks are 5 and whose parents graduated from university are more successful in TIMSS exams than others.  相似文献   

9.
We present a (non‐standard) probabilistic analysis of dynamic data structures whose sizes are considered dynamic random walks. The basic operations (insertion, deletion, positive and negative queries, batched insertion, lazy deletion, etc.) are time‐dependent random variables. This model is a (small) step toward the analysis of these structures when the distribution of the set of histories is not uniform. As an illustration, we focus on list structures (linear lists, priority queues, and dictionaries) but the technique is applicable as well to more advanced data structures. © 2005 Wiley Periodicals, Inc. Random Struct. Alg., 2006  相似文献   

10.
A sufficient condition is given so that in a ballB(x, R), there are no functions whose average over all Euclidean motions of an open bounded set Ω which are contained inB(x, R) vanish, except for the zero function.

La recherche de C. A. Berenstein a été effectuée dans le cadre des constrats NSF-DMS 8703072 et ARO-DAA L 0386K0115. M. Berenstein remercie ces organismes pour leur appui.  相似文献   

11.
12.
In games with costly signaling, some equilibria are vulnerable to deviations which could be “unambiguously” interpreted as coming from a unique set of Sender-types. This occurs when these types are precisely the ones who gain from deviating for any beliefs the Receiver could form over that set. We show that this idea characterizes a unique equilibrium outcome in two classes of games. First, in monotonic signaling games, only the Riley outcome is immune to this sort of deviation. Our result therefore provides a plausible story behind the selection made by Cho and Kreps’s (Q J Econ 102:179–221, 1987) D1 criterion on this class of games. Second, we examine a version of Crawford and Sobel (Econometrica 50:1431–1451, 1982) model with costly signaling, where standard refinements have no effect. We show that only a Riley-like separating equilibrium is immune to these deviations.  相似文献   

13.
We present a primitive recursive programinf_with_lists computing the minimum of two natural numbersn andp (written in unary notation) and using primitive recursion on lists. This program has at first sight the required property of visiting simultaneously its inputs, so it is a counterexample to a theorem showing that such a program cannot be written in the language of primitive recursion on natural numbers, in the more general framework of primitive recursion on term algebras. However, its complexity is at leastinf(n,p)2 so it does not implement the algorithm we have in mind to computeinf(n,p).  相似文献   

14.
This paper reports on an experimental study of the distribution of the length of simplex paths for the Optimal Assignment Problem. We study the distribution of the pivot counts for a version of the simplex method that with essentially equal probabilities introduces any variable with negative reduced cost into the basis. In this situation the distribution of the pivot counts turns out to be normally distributed and independent of the actual cost coefficients, provided these are sufficiently spread out. Further, the mean and standard deviation grow only moderately with the size of the problem, namely asd 1.8, andd 1.5 respectively for ad×d problem, implying in particular that the pivot counts concentrate around the mean with growingd. The usual simplex method on the other hand gives a growth ofd 1.6. Hence a large part of the favourable polynomial growth experienced on practical problems may be attributed to the fact that the simplex paths are rather short on the average, at least for assignment problems.  相似文献   

15.
《随机分析与应用》2013,31(5):863-892
This paper investigates control chart schemes for detecting drifts in the process mean μ and/or process standard deviation σ when individual observations are sampled. Drifts may be due to causes such as gradual deterioration of equipment, catalyst aging, waste accumulation, or human causes, such as operator fatigue or close supervision. The standard Shewhart X chart and moving range (MR) chart are evaluated, as well as several types of exponentially weighted moving average (EWMA) charts and combinations of charts involving these EWMA charts. We show that the combinations of the EWMA charts detect slow-rate and moderate-rate drifts much faster than the combined X and MR charts. We also show that varying the sampling interval adaptively as a function of the process data results in notable reductions in the detection delay of drifts in μ and/or σ.  相似文献   

16.
《随机分析与应用》2013,31(2):459-477
Abstract

We select the kth order statistic from each row from a sequence of independent and identically distributed random variables from a distribution that generalizes the Pareto distribution. We then examine weighted sums of these order statistics to see whether or not Laws of Large Numbers with nonzero limits exist.  相似文献   

17.
Abstract

Test-based variable selection algorithms in regression often are based on sequential comparison of test statistics to cutoff values. A predetermined a level typically is used to determine the cutoffs based on an assumed probability distribution for the test statistic. For example, backward elimination or forward stepwise involve comparisons of test statistics to prespecified t or F cutoffs in Gaussian linear regression, while a likelihood ratio. Wald, or score statistic, is typically used with standard normal or chi square cutoffs in nonlinear settings. Although such algorithms enjoy widespread use, their statistical properties are not well understood, either theoretically or empirically. Two inherent problems with these methods are that (1) as in classical hypothesis testing, the value of α is arbitrary, while (2) unlike hypothesis testing, there is no simple analog of type I error rate corresponding to application of the entire algorithm to a data set. In this article we propose a new method, backward elimination via cross-validation (BECV), for test-based variable selection in regression. It is implemented by first finding the empirical p value α*, which minimizes a cross-validation estimate of squared prediction error, then selecting the model by running backward elimination on the entire data set using α* as the nominal p value for each test. We present results of an extensive computer simulation to evaluate BECV and compare its performance to standard backward elimination and forward stepwise selection.  相似文献   

18.
A gauge functionf(·) is a nonnegative convex function that is positively homogeneous and satisfiesf(0)=0. Norms and pseudonorms are specific instances of a gauge function. This paper presents a gauge duality theory for a gauge program, which is the problem of minimizing the value of a gauge functionf(·) over a convex set. The gauge dual program is also a gauge program, unlike the standard Lagrange dual. We present sufficient conditions onf(·) that ensure the existence of optimal solutions to the gauge program and its dual, with no duality gap. These sufficient conditions are relatively weak and are easy to verify, and are independent of any qualifications on the constraints. The theory is applied to a class of convex quadratic programs, and to the minimuml p norm problem. The gauge dual program is shown to provide a smaller duality than the standard dual, in a certain sense discussed in the text.  相似文献   

19.
The effect of high school study of mathematics on numeracy performance of sports and exercise science (SES) students is not clear. To investigate this further, we tested the numeracy skills of 401 students enrolled in a Bachelor of Health Sciences degree in SES using a multiple-choice survey consisting of four background questions and 39 numeracy test questions. Background questions (5-point scale) focused on highest level of mathematics studied at high school, self-perception of mathematics proficiency, perceived importance of mathematics to SES and likelihood of seeking help with mathematics. Numeracy questions focused on rational number, ratios and rates, basic algebra and graph interpretation. Numeracy performance was based on answers to these questions (1 mark each) and represented by the total score (maximum = 39). Students from first (n = 212), second (n = 78) and third (n = 111) years of the SES degree completed the test. The distribution of numeracy test scores for the entire cohort was negatively skewed with a median (IQR) score of 27(11). We observed statistically significant associations between test scores and the highest level of mathematics studied (P < 0.05), being lowest in students who studied Year 10 Mathematics (20 (9)), intermediate in students who studied Year 12 General Mathematics (26 (8)) and highest in two groups of students who studied higher-level Year 12 Mathematics (31 (9), 31 (6)). There were statistically significant associations between test scores and level of self-perception of mathematics proficiency and also likelihood of seeking help with mathematics (P < 0.05) but not with perceived importance of mathematics to SES. These findings reveal that the level of mathematics studied in high school is a critical factor determining the level of numeracy performance in SES students.  相似文献   

20.
We investigate a one-parametric class of merit functions for the second-order cone complementarity problem (SOCCP) which is closely related to the popular Fischer–Burmeister (FB) merit function and natural residual merit function. In fact, it will reduce to the FB merit function if the involved parameter τ equals 2, whereas as τ tends to zero, its limit will become a multiple of the natural residual merit function. In this paper, we show that this class of merit functions enjoys several favorable properties as the FB merit function holds, for example, the smoothness. These properties play an important role in the reformulation method of an unconstrained minimization or a nonsmooth system of equations for the SOCCP. Numerical results are reported for some convex second-order cone programs (SOCPs) by solving the unconstrained minimization reformulation of the KKT optimality conditions, which indicate that the FB merit function is not the best. For the sparse linear SOCPs, the merit function corresponding to τ=2.5 or 3 works better than the FB merit function, whereas for the dense convex SOCPs, the merit function with τ=0.1, 0.5 or 1.0 seems to have better numerical performance.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号