首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 350 毫秒
1.
There are various methods in knowledge space theory for building knowledge structures or surmise relations from data. Few of them have been thoroughly analyzed, making it difficult to decide which of these methods provides good results and when to apply each of the methods.In this paper, we investigate the method known as inductive item tree analysis and discuss the advantages and disadvantages of this algorithm. In particular, we introduce some corrections and improvements to it, resulting in two newly proposed algorithms. These algorithms and the original inductive item tree analysis procedure are compared in a simulation study and with empirical data.  相似文献   

2.
Inductive item tree analysis (IITA) comprises three data analysis algorithms for deriving quasi-orders to represent reflexive and transitive precedence relations among binary variables. In previous studies, when comparing the IITA algorithms in simulations, the representativeness of the sampled quasi-orders was not considered or implemented only unsatisfactorily. In the present study, we show that this issue yields non-representative samples of quasi-orders, and thus biased or incorrect conclusions about the performance of the IITA algorithms used to reconstruct underlying relational dependencies. We report the results of a new, truly representative simulation study, which corrects for these problems and that allows the algorithms to be compared in a reliable manner.  相似文献   

3.
In practical data mining tasks, high-dimensional data has to be analyzed. In most of the cases it is very informative to map and visualize the hidden structure of a complex data set in a low-dimensional space. In this paper a new class of mapping algorithms is defined. These algorithms combine topology representing networks and different nonlinear mapping algorithms. While the former methods aim to quantify the data and disclose the real structure of the objects, the nonlinear mapping algorithms are able to visualize the quantized data in the low-dimensional vector space. In this paper, techniques based on these methods are gathered and the results of a detailed analysis performed on them are shown. The primary aim of this analysis is to examine the preservation of distances and neighborhood relations of the objects. Preservation of neighborhood relations was analyzed both in local and global environments. To evaluate the main properties of the examined methods we show the outcome of the analysis based both on synthetic and real benchmark examples.  相似文献   

4.
A scenario tree is an efficient way to represent a stochastic data process in decision problems under uncertainty. This paper addresses how to efficiently generate appropriate scenario trees. A knowledge‐based scenario tree generation method is proposed; the new method is further improved by accounting for subjective judgements or expectations about the random future. Compared with existing approaches, complicated mathematical models and time‐consuming estimation, simulation and optimization problem solution are avoided in our knowledge‐based algorithms, and large‐scale scenario trees can be quickly generated. To show the advantages of the new algorithms, a multiperiod portfolio selection problem is considered, and a dynamic risk measure is adopted to control the intermediate risk, which is superior to the single‐period risk measure used in the existing literature. A series of numerical experiments are carried out by using real trading data from the Shanghai stock market. The results show that the scenarios generated by our algorithms can properly represent the underlying distribution; our algorithms have high performance, say, a scenario tree with up to 10,000 scenarios can be generated in less than a half minute. The applications in the multiperiod portfolio management problem demonstrate that our scenario tree generation methods are stable, and the optimal trading strategies obtained with the generated scenario tree are reasonable, efficient and robust. Copyright © 2013 John Wiley & Sons, Ltd.  相似文献   

5.
Regression trees are a popular alternative to classical regression methods. A number of approaches exist for constructing regression trees. Most of these techniques, including CART, are sequential in nature and locally optimal at each node split, so the final tree solution found may not be the best tree overall. In addition, small changes in the training data often lead to large changes in the final result due to the relative instability of these greedy tree-growing algorithms. Ensemble techniques, such as random forests, attempt to take advantage of this instability by growing a forest of trees from the data and averaging their predictions. The predictive performance is improved, but the simplicity of a single-tree solution is lost.

In earlier work, we introduced the Tree Analysis with Randomly Generated and Evolved Trees (TARGET) method for constructing classification trees via genetic algorithms. In this article, we extend the TARGET approach to regression trees. Simulated data and real world data are used to illustrate the TARGET process and compare its performance to CART, Bayesian CART, and random forests. The empirical results indicate that TARGET regression trees have better predictive performance than recursive partitioning methods, such as CART, and single-tree stochastic search methods, such as Bayesian CART. The predictive performance of TARGET is slightly worse than that of ensemble methods, such as random forests, but the TARGET solutions are far more interpretable.  相似文献   

6.
In many domains, data now arrive faster than we are able to mine it. To avoid wasting these data, we must switch from the traditional “one-shot” data mining approach to systems that are able to mine continuous, high-volume, open-ended data streams as they arrive. In this article we identify some desiderata for such systems, and outline our framework for realizing them. A key property of our approach is that it minimizes the time required to build a model on a stream while guaranteeing (as long as the data are iid) that the model learned is effectively indistinguishable from the one that would be obtained using infinite data. Using this framework, we have successfully adapted several learning algorithms to massive data streams, including decision tree induction, Bayesian network learning, k-means clustering, and the EM algorithm for mixtures of Gaussians. These algorithms are able to process on the order of billions of examples per day using off-the-shelf hardware. Building on this, we are currently developing software primitives for scaling arbitrary learning algorithms to massive data streams with minimal effort.  相似文献   

7.
《Journal of Complexity》1999,15(1):72-127
We say that a data structure is builton-lineif, at any instant, we have the data structure corresponding to the input we have seen up to that instant. For instance, consider the suffix tree of a stringx[1, n]. An algorithm building iton-lineis such that, when we have read the firstisymbols ofx[1, n], we have the suffix tree forx[1, i]. We present a new technique, which we refer to asimplicit updates, based on which we obtain: (a) an algorithm for theon-lineconstruction of the Lsuffix tree of ann×nmatrixA—this data structure is the two-dimensional analog of the suffix tree of a string; (b) simple algorithms implementing primitive operations forLZ1-typeon-line losslessimage compression methods. Those methods, recently introduced by Storer, are generalizations ofLZ1-typecompression methods for strings. For the problem in (a), we get nearly an order of magnitude improvement over algorithms that can be derived from known techniques. For the problem in (b), we do not get an asymptotic speed-up with respect to what can be done with known techniques; rather we show that our algorithms are a natural support for the primitive operations. This may lead to faster implementations of those primitive operations. To the best of our knowledge, our technique is the first one that effectively addresses problems related to theon-lineconstruction of two-dimensional suffix trees.  相似文献   

8.
Gradient-based simulation optimization under probability constraints   总被引:1,自引:0,他引:1  
We study optimization problems subject to possible fatal failures. The probability of failure should not exceed a given confidence level. The distribution of the failure event is assumed unknown, but it can be generated via simulation or observation of historical data. Gradient-based simulation-optimization methods pose the difficulty of the estimation of the gradient of the probability constraint under no knowledge of the distribution. In this work we provide two single-path estimators with bias: a convolution method and a finite difference, and we provide a full analysis of convergence of the Arrow-Hurwicz algorithm, which we use as our solver for optimization. Convergence results are used to tune the parameters of the numerical algorithms in order to achieve best convergence rates, and numerical results are included via an example of application in finance.  相似文献   

9.
In generalized tree alignment problem, we are given a set S of k biologically related sequences and we are interested in a minimum cost evolutionary tree for S. In many instances of this problem partial phylogenetic tree for S is known. In such instances, we would like to make use of this knowledge to restrict the tree topologies that we consider and construct a biologically relevant minimum cost evolutionary tree. So, we propose the following natural generalization of the generalized tree alignment problem, a problem known to be MAX-SNP Hard, stated as follows:
Constrained Generalized Tree Alignment Problem [S. Divakaran, Algorithms and heuristics for constrained generalized alignment problem, DIMACS Technical Report 2007-21, 2007]: Given a set S of k related sequences and a phylogenetic forest comprising of node-disjoint phylogenetic trees that specify the topological constraints that an evolutionary tree of S needs to satisfy, construct a minimum cost evolutionary tree for S.
In this paper, we present constant approximation algorithms for the constrained generalized tree alignment problem. For the generalized tree alignment problem, a special case of this problem, our algorithms provide a guaranteed error bound of 2−2/k.  相似文献   

10.
Many numerical optimization methods use scenario trees as a discrete approximation for the true (multi-dimensional) probability distributions of the problem’s random variables. Realistic specifications in financial optimization models can lead to tree sizes that quickly become computationally intractable. In this paper we focus on the two main approaches proposed in the literature to deal with this problem: scenario reduction and state aggregation. We first state necessary conditions for the node structure of a tree to rule out arbitrage. However, currently available scenario reduction algorithms do not take these conditions explicitly into account. State aggregation excludes arbitrage opportunities by relying on the risk-neutral measure. This is, however, only appropriate for pricing purposes but not for optimization. Both limitations are illustrated by numerical examples. We conclude that neither of these methods is suitable to solve financial optimization models in asset–liability or portfolio management.  相似文献   

11.
本文首先根据最小支撑树的截性质和圈性质给出了灵敏度分析的基本公式,然后基于现代图论算法中经典的Split—findmian数据结构介绍了树上边的灵敏度分析算法,最后将非树边的灵敏度分析转化为已有成熟的算法的Set—maxima问题进行处理.  相似文献   

12.
The attribute-oriented induction (AOI for short) method is one of the most important data mining methods. The input of the AOI method contains a relational table and a concept tree (concept hierarchy) for each attribute, and the output is a small relation summarizing the general characteristics of the task-relevant data. Although AOI is very useful for inducing general characteristics, it has the limitation that it can only be applied to relational data, where there is no order among the data items. If the data are ordered, the existing AOI methods are unable to find the generalized knowledge. In view of this weakness, this paper proposes a dynamic programming algorithm, based on AOI techniques, to find generalized knowledge from an ordered list of data. By using the algorithm, we can discover a sequence of K generalized tuples describing the general characteristics of different segments of data along the list, where K is a parameter specified by users.  相似文献   

13.
Cache placement in sensor networks under an update cost constraint   总被引:1,自引:0,他引:1  
In this paper, we address an optimization problem that arises in the context of cache placement in sensor networks. In particular, we consider the cache placement problem where the goal is to determine a set of nodes in the network to cache/store the given data item, such that the overall communication cost incurred in accessing the item is minimized, under the constraint that the total communication cost in updating the selected caches is less than a given constant. In our network model, there is a single server (containing the original copy of the data item) and multiple client nodes (that wish to access the data item). For various settings of the problem, we design optimal, near-optimal, heuristic-based, and distributed algorithms, and evaluate their performance through simulations on randomly generated sensor networks.  相似文献   

14.
Replacing suffix trees with enhanced suffix arrays   总被引:9,自引:0,他引:9  
The suffix tree is one of the most important data structures in string processing and comparative genomics. However, the space consumption of the suffix tree is a bottleneck in large scale applications such as genome analysis. In this article, we will overcome this obstacle. We will show how every algorithm that uses a suffix tree as data structure can systematically be replaced with an algorithm that uses an enhanced suffix array and solves the same problem in the same time complexity. The generic name enhanced suffix array stands for data structures consisting of the suffix array and additional tables. Our new algorithms are not only more space efficient than previous ones, but they are also faster and easier to implement.  相似文献   

15.
The theory of Gaussian graphical models is a powerful tool for independence analysis between continuous variables. In this framework, various methods have been conceived to infer independence relations from data samples. However, most of them result in stepwise, deterministic, descent algorithms that are inadequate for solving this issue. More recent developments have focused on stochastic procedures, yet they all base their research on strong a priori knowledge and are unable to perform model selection among the set of all possible models. Moreover, convergence of the corresponding algorithms is slow, precluding applications on a large scale. In this paper, we propose a novel Bayesian strategy to deal with structure learning. Relating graphs to their supports, we convert the problem of model selection into that of parameter estimation. Use of non-informative priors and asymptotic results yield a posterior probability for independence graph supports in closed form. Gibbs sampling is then applied to approximate the full joint posterior density. We finally give three examples of structure learning, one from synthetic data, and the two others from real data.  相似文献   

16.
Methods for analyzing or learning from “fuzzy data” have attracted increasing attention in recent years. In many cases, however, existing methods (for precise, non-fuzzy data) are extended to the fuzzy case in an ad-hoc manner, and without carefully considering the interpretation of a fuzzy set when being used for modeling data. Distinguishing between an ontic and an epistemic interpretation of fuzzy set-valued data, and focusing on the latter, we argue that a “fuzzification” of learning algorithms based on an application of the generic extension principle is not appropriate. In fact, the extension principle fails to properly exploit the inductive bias underlying statistical and machine learning methods, although this bias, at least in principle, offers a means for “disambiguating” the fuzzy data. Alternatively, we therefore propose a method which is based on the generalization of loss functions in empirical risk minimization, and which performs model identification and data disambiguation simultaneously. Elaborating on the fuzzification of specific types of losses, we establish connections to well-known loss functions in regression and classification. We compare our approach with related methods and illustrate its use in logistic regression for binary classification.  相似文献   

17.
We present a toolbox for extracting asymptotic information on the coefficients of combinatorial generating functions. This toolbox notably includes a treatment of the effect of Hadamard products on singularities in the context of the complex Tauberian technique known as singularity analysis. As a consequence, it becomes possible to unify the analysis of a number of divide-and-conquer algorithms, or equivalently random tree models, including several classical methods for sorting, searching, and dynamically managing equivalence relations.  相似文献   

18.
Grouping objects into different categories is a basic means of cognition. In the fields of machine learning and statistics, this subject is addressed by cluster analysis. Yet, it is still controversially discussed how to assess the reliability and quality of clusterings. In particular, it is hard to determine the optimal number of clusters inherent in the underlying data. Running different cluster algorithms and cluster validation methods usually yields different optimal clusterings. In fact, several clusterings with different numbers of clusters are plausible in many situations, as different methods are specialized on diverse structural properties. To account for the possibility of multiple plausible clusterings, we employ a multi-objective approach for collecting cluster alternatives (MOCCA) from a combination of cluster algorithms and validation measures. In an application to artificial data as well as microarray data sets, we demonstrate that exploring a Pareto set of optimal partitions rather than a single solution can identify alternative solutions that are overlooked by conventional clustering strategies. Competitive solutions are hereby ranked following an impartial criterion, while the ultimate judgement is left to the investigator.  相似文献   

19.
In this article we want to give an analogous in the profinite case to the following theorem: an abstract group is free if and only if it acts freely on a tree. In a first time we define a combinatory object, the protrees, which are particular inductive systems extracted from projective systems of graphs. Then we define a notion of profinite action. These objects allow us to give the following analogous: a profinite group contains a dense abstract free subgroup if and only if it acts profreely on a protree.  相似文献   

20.
We model the evolution of biological and linguistic sequences by comparing their statistical properties. This comparison is performed by means of efficiently computable kernel functions, that take two sequences as an input and return a measure of statistical similarity between them. We show how the use of such kernels allows to reconstruct the phylogenetic trees of primates based on the mitochondrial DNA (mtDNA) of existing animals, and the phylogenetic tree of Indo-European and other languages based on sample documents from existing languages. Kernel methods provide a convenient framework for many pattern analysis tasks, and recent advances have been focused on efficient methods for sequence comparison and analysis. While a large toolbox of algorithms has been developed to analyze data by using kernels, in this paper we demonstrate their use in combination with standard phylogenetic reconstruction algorithms and visualization methods.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号