首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 46 毫秒
1.
Previous research has resulted in a number of different algorithms for rule discovery. Two approaches discussed here, the ‘all-rules’ algorithm and multi-objective metaheuristics, both result in the production of a large number of partial classification rules, or ‘nuggets’, for describing different subsets of the records in the class of interest. This paper describes the application of a number of different clustering algorithms to these rules, in order to identify similar rules and to better understand the data.  相似文献   

2.
We propose a method for selecting variables in latent class analysis, which is the most common model-based clustering method for discrete data. The method assesses a variable’s usefulness for clustering by comparing two models, given the clustering variables already selected. In one model the variable contributes information about cluster allocation beyond that contained in the already selected variables, and in the other model it does not. A headlong search algorithm is used to explore the model space and select clustering variables. In simulated datasets we found that the method selected the correct clustering variables, and also led to improvements in classification performance and in accuracy of the choice of the number of classes. In two real datasets, our method discovered the same group structure with fewer variables. In a dataset from the International HapMap Project consisting of 639 single nucleotide polymorphisms (SNPs) from 210 members of different groups, our method discovered the same group structure with a much smaller number of SNPs.  相似文献   

3.
For determining an optimal portfolio allocation, parameters representing the underlying market—characterized by expected asset returns and the covariance matrix—are needed. Traditionally, these point estimates for the parameters are obtained from historical data samples, but as experts often have strong opinions about (some of) these values, approaches to combine sample information and experts’ views are sought for. The focus of this paper is on the two most popular of these frameworks—the Black-Litterman model and the Bayes approach. We will prove that—from the point of traditional portfolio optimization—the Black-Litterman is just a special case of the Bayes approach. In contrast to this, we will show that the extensions of both models to the robust portfolio framework yield two rather different robustified optimization problems.  相似文献   

4.
Fuzzy Rule-Based Systems have been succesfully applied to pattern classification problems. In this type of classification systems, the classical Fuzzy Reasoning Method (FRM) classifies a new example with the consequent of the rule with the greatest degree of association. By using this reasoning method, we lose the information provided by the other rules with different linguistic labels which also represent this value in the pattern attribute, although probably to a lesser degree. The aim of this paper is to present new FRMs which allow us to improve the system performance, maintaining its interpretability. The common aspect of the proposals is the participation, in the classification of the new pattern, of the rules that have been fired by such pattern. We formally describe the behaviour of a general reasoning method, analyze six proposals for this general model, and present a method to learn the parameters of these FRMs by means of Genetic Algorithms, adapting the inference mechanism to the set of rules. Finally, to show the increase of the system generalization capability provided by the proposed FRMs, we point out some results obtained by their integration in a fuzzy rule generation process.  相似文献   

5.
Cluster analysis is a popular technique in statistics and computer science with the objective of grouping similar observations in relatively distinct groups generally known as clusters. Semi-supervised clustering assumes that some additional information about group memberships is available. Under the most frequently considered scenario, labels are known for some portion of data and unavailable for the rest of observations. In this paper, we discuss a general type of semi-supervised clustering defined by so called positive and negative constraints. Under positive constraints, some data points are required to belong to the same cluster. On the contrary, negative constraints specify that particular points must represent different data groups. We outline a general framework for semi-supervised clustering with constraints naturally incorporating the additional information into the EM algorithm traditionally used in mixture modeling and model-based clustering. The developed methodology is illustrated on synthetic and classification datasets. A dendrochronology application is considered and thoroughly discussed.  相似文献   

6.
随机效应模型中方差分量的经验Bayes检验问题   总被引:4,自引:0,他引:4  
给出了双向分类随机效应模型中方差分量的Bayes检验的判决函数,利用核估计的方法,构造了相应的经验Bayes(EB)检验的判决函数.在适当的条件下证明了EB判决函数是渐近最优的且有收敛速度.给出了模型的特例和推广.最后,举出一个满足定理条件的例子.  相似文献   

7.
This paper proposes a novel ant colony optimisation (ACO) algorithm tailored for the hierarchical multi-label classification problem of protein function prediction. This problem is a very active research field, given the large increase in the number of uncharacterised proteins available for analysis and the importance of determining their functions in order to improve the current biological knowledge. Since it is known that a protein can perform more than one function and many protein functional-definition schemes are organised in a hierarchical structure, the classification problem in this case is an instance of a hierarchical multi-label problem. In this type of problem, each example may belong to multiple class labels and class labels are organised in a hierarchical structure—either a tree or a directed acyclic graph structure. It presents a more complex problem than conventional flat classification, given that the classification algorithm has to take into account hierarchical relationships between class labels and be able to predict multiple class labels for the same example. The proposed ACO algorithm discovers an ordered list of hierarchical multi-label classification rules. It is evaluated on sixteen challenging bioinformatics data sets involving hundreds or thousands of class labels to be predicted and compared against state-of-the-art decision tree induction algorithms for hierarchical multi-label classification.  相似文献   

8.
In this paper we study the asymptotic behavior of Bayes estimators for hidden Markov models as the number of observations goes to infinity. The theorem that we prove is similar to the Bernstein—von Mises theorem on the asymptotic behavior of the posterior distribution for the case of independent observations. We show that our theorem is applicable to a wide class of hidden Markov models. We also discuss the implication of the theorem’s assumptions for several models that are used in practical applications such as ion channel kinetics.   相似文献   

9.
Latent tree models were proposed as a class of models for unsupervised learning, and have been applied to various problems such as clustering and density estimation. In this paper, we study the usefulness of latent tree models in another paradigm, namely supervised learning. We propose a novel generative classifier called latent tree classifier (LTC). An LTC represents each class-conditional distribution of attributes using a latent tree model, and uses Bayes rule to make prediction. Latent tree models can capture complex relationship among attributes. Therefore, LTC is able to approximate the true distribution behind data well and thus achieves good classification accuracy. We present an algorithm for learning LTC and empirically evaluate it on an extensive collection of UCI data. The results show that LTC compares favorably to the state-of-the-art in terms of classification accuracy. We also demonstrate that LTC can reveal underlying concepts and discover interesting subgroups within each class.  相似文献   

10.
The common investment decision rules, Markowitz’s Mean-Variance (MV) rule and the non-parametric Stochastic Dominance (SD) rules, suffer from one severe drawback: there are pairs of prospects where experimentally 100% of the subjects choose one prospect, yet these rules are unable to rank the two prospects—a paradoxical result. Thus, the set of all preferences corresponding to these decision rules is too large, because it contains theoretical preferences that are not encountered in practice. Based on 400 subjects’ choices we define the economically relevant set of preference and the corresponding new decision rules, which avoid the paradoxical results. The results are very robust and are almost unaffected by the magnitude of the outcomes and the structure of the prospects under consideration.  相似文献   

11.
Classification is concerned with the development of rules for the allocation of observations to groups, and is a fundamental problem in machine learning. Much of previous work on classification models investigates two-group discrimination. Multi-category classification is less-often considered due to the tendency of generalizations of two-group models to produce misclassification rates that are higher than desirable. Indeed, producing “good” two-group classification rules is a challenging task for some applications, and producing good multi-category rules is generally more difficult. Additionally, even when the “optimal” classification rule is known, inter-group misclassification rates may be higher than tolerable for a given classification model. We investigate properties of a mixed-integer programming based multi-category classification model that allows for the pre-specification of limits on inter-group misclassification rates. The mechanism by which the limits are satisfied is the use of a reserved judgment region, an artificial category into which observations are placed whose attributes do not sufficiently indicate membership to any particular group. The method is shown to be a consistent estimator of a classification rule with misclassification limits, and performance on simulated and real-world data is demonstrated.  相似文献   

12.
We study a vendor selection problem in which the buyer allocates an order quantity for an item among a set of suppliers such that the required aggregate quality, service, and lead time requirements are achieved at minimum cost. Some or all of these characteristics can be stochastic and hence, we treat the aggregate quality and service as uncertain. We develop a class of special chance-constrained programming models and a genetic algorithm is designed for the vendor selection problem. The solution procedure is tested on randomly generated problems and our computational experience is reported. The results demonstrate that the suggested approach could provide managers a promising way for studying the stochastic vendor selection problem. The authors would like to thank the referees for providing constructive comments that led to an improved version of the paper. Also, this research was partially supported by grants from National Natural Science Foundation (60776825)—China, 863 Programs (2007AA11Z208)—China, Doctorate Foundation (20040004012)—China, Villanova University Research Sabbatical Fall 2006, and the National Science Foundation (0332490)—USA.  相似文献   

13.
In this paper the Bayes estimates are constructed for the distribution density of sufficient statistics in the case of a normal law. The asymptotic properties of these estimates are discussed. These estimates may be used for constructing classification rules which allow one to solve new classification problems. Translated fromStatisticheskie Metody Otsenivaniya i Proverki Gipotez, pp. 5–13, Perm, 1991.  相似文献   

14.
We revisit the interactive model-based approach to global optimization proposed in Wang and Garcia (J Glob Optim 61(3):479–495, 2015) in which parallel threads independently execute a model-based search method and periodically interact through a simple acceptance-rejection rule aimed at preventing duplication of search efforts. In that paper it was assumed that each thread successfully identifies a locally optimal solution every time the acceptance-rejection rule is implemented. Under this stylized model of computational time, the rate of convergence to a globally optimal solution was shown to increase exponentially in the number of threads. In practice however, the computational time required to identify a locally optimal solution varies greatly. Therefore, when the acceptance-rejection rule is implemented, several threads may fail to identify a locally optimal solution. This situation calls for reallocation of computational resources in order to speed up the identification of local optima when one or more threads repeatedly fail to do so. In this paper we consider an implementation of the interactive model-based approach that accounts for real time, that is, it takes into account the possibility that several threads may fail to identify a locally optimal solution whenever the acceptance-rejection rule is implemented. We propose a modified acceptance-rejection rule that alternates between enforcing diverse search (in order to prevent duplication) and reallocation of computational effort (in order to speed up the identification of local optima). We show that the rate of convergence in real-time increases with the number of threads. This result formalizes the idea that in parallel computing, exploitation and exploration can be complementary provided relatively simple rules for interaction are implemented. We report the results from extensive numerical experiments which are illustrate the theoretical analysis of performance.  相似文献   

15.
Summary A heuristic method of reducing a class of admissible or Bayes decision rules is given. A new risk function is defined which is called the locally averaged risk. Bayes and admissible rules with respect to the new risk function are calledG-Bayes andG-admissible, respectively. It is shown under general assumptions that the class ofG-Bayes decision rules is a subset of the class of Bayes decision rules and the class ofG-admissible decision rules is a subset of the class of admissible decision rules. Some examples are considered, showing that the usual estimates of the parameter of a distribution with squared error as loss function, which are known to be admissible, are alsoG-admissible. This work was supported in part by NASA Grant-NGR 15-003-064 and NSF Grant-GP 7496 at Indiana University.  相似文献   

16.
An optimal equivariant Bayes estimate of the density of a matrix normal distribution is obtained. This estimate is applied to the construction of the optimal Bayes group classification rule. Translated fromStatisticheskie Metody Otsenivaniya i Proverki Gipotez, pp. 29–39, Perm, 1990.  相似文献   

17.
Bayes estimators are proposed for the likelihood functions of random matrices having Wishart's distribution. These estimators are used to construct an asymptotically optimal classification rule. The classification problem in the case of the chi-squared distribution is also considered. Translated fromStatisticheskie Metody Otsenivaniya i Proverki Gipotez, pp. 11–18, Perm, 1990.  相似文献   

18.
We study the class of the Riesz subsets of abelian discrete groups, that is, the sets for which the F. and M. Riesz theorem extends. We show that the “classical” tools of the theory — Riesz projections, localization in the Bohr sense, products — are leading to Riesz sets which are satisfying nice additional properties, e.g., the Mooney-Havin result extends to this class. We give an alternative proof of a result of A. B. Alexandrov, and we improve a construction of H. P. Rosenthal. The connection is made between this class and theM-structure theory. We show a result of convergence at the boundary for holomorphic functions on the polydisc. The Bourgain-Davis result on convergence of analytic martingales is improved.  相似文献   

19.
In this paper, a statistical decision rule based on the Bayes estimators for the group classification of dependent Gaussian observations is constructed. Translated fromStatisticheskie Metody Otsenivaniya i Proverki Gipotez, pp. 33–41. Perm, 1993.  相似文献   

20.
The L(2, 1)-labeling problem for a graph G is a variation of the standard graph coloring problem. Here, we seek to assign a label (color) to each node of G such that nodes a distance of two apart are assigned unique labels and adjacent nodes receive labels which are at least two apart. In a previous paper—presented at the 23rd IASTED International Multi-Conference: Parallel and Distributed Computing and Networks, Innsbruck, Austria—we presented, to the best of our knowledge, the first self-stabilizing algorithm which {Δ +  2}-L(2, 1)-labels rooted trees. That algorithm was shown to require an exponential number of moves to stabilize on a global solution (which is not uncommon in self-stabilizing systems). In this paper, we present two self-stabilizing algorithms which {Δ +  2}-L(2, 1)-label a given rooted tree T in only O(nh) moves (where h is the height and n is the number of nodes in the tree T) under a central scheduler. We also show how the algorithms may be adapted to unrooted trees, dynamic topology changes, and consider the correctness of the protocols under the distributed scheduler model.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号