首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Rather than induce classification rules by sophisticated algorithms, we introduce a fully interactive approach for building classifiers from large multivariate datasets based on the table lens, a multidimensional visualization technique, and appropriate interaction capabilities. Constructing classifiers is an interaction with a feedback loop. The domain knowledge and human perception can be profitably included. In our approach, both continuous and categorical attributes are processed uniformly, and continuous attributes are partitioned on the fly. Our performance evaluation with data sets from the UCI repository demonstrates that this interactive approach is useful to easily build understandable classifiers with high prediction accuracy and no required a prior knowledge about the datasets.  相似文献   

2.
In the Knowledge Discovery Process, classification algorithms are often used to help create models with training data that can be used to predict the classes of untested data instances. While there are several factors involved with classification algorithms that can influence classification results, such as the node splitting measures used in making decision trees, feature selection is often used as a pre-classification step when using large data sets to help eliminate irrelevant or redundant attributes in order to increase computational efficiency and possibly to increase classification accuracy. One important factor common to both feature selection as well as to classification using decision trees is attribute discretization, which is the process of dividing attribute values into a smaller number of discrete values. In this paper, we will present and explore a new hybrid approach, ChiBlur, which involves the use of concepts from both the blurring and χ2-based approaches to feature selection, as well as concepts from multi-objective optimization. We will compare this new algorithm with algorithms based on the blurring and χ2-based approaches.  相似文献   

3.
We revisit the well-known problem of sorting under partial information: sort a finite set given the outcomes of comparisons between some pairs of elements. The input is a partially ordered set P, and solving the problem amounts to discovering an unknown linear extension of P, using pairwise comparisons. The information-theoretic lower bound on the number of comparisons needed in the worst case is log e(P), the binary logarithm of the number of linear extensions of P. In a breakthrough paper, Jeff Kahn and Jeong Han Kim (J. Comput. System Sci. 51 (3), 390–399, 1995) showed that there exists a polynomial-time algorithm for the problem achieving this bound up to a constant factor. Their algorithm invokes the ellipsoid algorithm at each iteration for determining the next comparison, making it impractical. We develop efficient algorithms for sorting under partial information. Like Kahn and Kim, our approach relies on graph entropy. However, our algorithms differ in essential ways from theirs. Rather than resorting to convex programming for computing the entropy, we approximate the entropy, or make sure it is computed only once, in a restricted class of graphs, permitting the use of a simpler algorithm. Specifically, we present:
  1. an O(n 2) algorithm performing O(logn·log e(P)) comparisons
  2. an O(n 2:5) algorithm performing at most (1+?) log e(P)+O ? (n) comparisons
  3. an O(n 2:5) algorithm performing O(log e(P)) comparisons.
All our algorithms can be implemented in such a way that their computational bottleneck is confined in a preprocessing phase, while the sorting phase is completed in O(q)+O(n) time, where q denotes the number of comparisons performed.  相似文献   

4.
This study proposes an improved solution algorithm using ant colony optimization (ACO) for finding global optimum for any given test functions. The procedure of the ACO algorithms simulates the decision-making processes of ant colonies as they forage for food and is similar to other artificial intelligent techniques such as Tabu search, Simulated Annealing and Genetic Algorithms. ACO algorithms can be used as a tool for optimizing continuous and discrete mathematical functions. The proposed algorithm is based on each ant searches only around the best solution of the previous iteration with β. The proposed algorithm is called as ACORSES, an abbreviation of ACO Reduced SEarch Space. β is proposed for improving ACO’s solution performance to reach global optimum fairly quickly. The ACORSES is tested on fourteen mathematical test functions taken from literature and encouraging results were obtained. The performance of ACORSES is compared with other optimization methods. The results showed that the ACORSES performs better than other optimization algorithms, available in literature in terms of minimum values of objective functions and number of iterations.  相似文献   

5.
There exist many data clustering algorithms, but they can not adequately handle the number of clusters or cluster shapes. Their performance mainly depends on a choice of algorithm parameters. Our approach to data clustering and algorithm does not require the parameter choice; it can be treated as a natural adaptation to the existing structure of distances between data points. The outlier factor introduced by the author specifies a degree of being an outlier for each data point. The outlier factor notion is based on the difference between the frequency distribution of interpoint distances in a given dataset and the corresponding distribution of uniformly distributed points. Then data clusters can be determined by maximizing the outlier factor function. The data points in dataset are divided into clusters according to the attractor regions of local optima. An experimental evaluation of the proposed algorithm shows that the proposed method can identify complex cluster shapes. Key advantages of the approach are: good clustering properties for datasets with comparatively large amount of noise (an additional data points), and an absence of important parameters which adequate choice determines the quality of results.  相似文献   

6.
In Rough Set Theory, the notion of bireduct allows to simultaneously reduce the sets of objects and attributes contained in a dataset. In addition, value reducts are used to remove some unnecessary values of certain attributes for a specific object. Therefore, the combination of both notions provides a higher reduction of unnecessary data. This paper is focused on the study of bireducts and value reducts of information and decision tables. We present theoretical results capturing different aspects about the relationship between bireducts and reducts, offering new insights at a conceptual level. We also analyze the relationship between bireducts and value reducts. The studied connections among these notions provide important profits for the efficient information analysis, as well as for the detection of unnecessary or redundant information.  相似文献   

7.
Attribute reduction is one of the key issues in rough set theory. Many heuristic attribute reduction algorithms such as positive-region reduction, information entropy reduction and discernibility matrix reduction have been proposed. However, these methods are usually computationally time-consuming for large data. Moreover, a single attribute significance measure is not good for more attributes with the same greatest value. To overcome these shortcomings, we first introduce a counting sort algorithm with time complexity O(∣C∣ ∣U∣) for dealing with redundant and inconsistent data in a decision table and computing positive regions and core attributes (∣C∣ and ∣U∣ denote the cardinalities of condition attributes and objects set, respectively). Then, hybrid attribute measures are constructed which reflect the significance of an attribute in positive regions and boundary regions. Finally, hybrid approaches to attribute reduction based on indiscernibility and discernibility relation are proposed with time complexity no more than max(O(∣C2U/C∣), O(∣C∣∣U∣)), in which ∣U/C∣ denotes the cardinality of the equivalence classes set U/C. The experimental results show that these proposed hybrid algorithms are effective and feasible for large data.  相似文献   

8.
This study analyzes multiobjective d-dimensional knapsack problems (MOd-KP) within a comparative analysis of three multiobjective evolutionary algorithms (MOEAs): the ε-nondominated sorted genetic algorithm II (ε-NSGAII), the strength Pareto evolutionary algorithm 2 (SPEA2) and the ε-nondominated hierarchical Bayesian optimization algorithm (ε-hBOA). This study contributes new insights into the challenges posed by correlated instances of the MOd-KP that better capture the decision interdependencies often present in real world applications. A statistical performance analysis of the algorithms uses the unary ε-indicator, the hypervolume indicator and success rate plots to demonstrate their relative effectiveness, efficiency, and reliability for the MOd-KP instances analyzed. Our results indicate that the ε-hBOA achieves superior performance relative to ε-NSGAII and SPEA2 with increasing number of objectives, number of decisions, and correlative linkages between the two. Performance of the ε-hBOA suggests that probabilistic model building evolutionary algorithms have significant promise for expanding the size and scope of challenging multiobjective problems that can be explored.  相似文献   

9.
This paper presents a modified Variable Neighborhood Search (VNS) heuristic algorithm for solving the Discrete Ordered Median Problem (DOMP). This heuristic is based on new neighborhoods’ structures that allow an efficient encoding of the solutions of the DOMP avoiding sorting in the evaluation of the objective function at each considered solution. The algorithm is based on a data structure, computed in preprocessing, that organizes the minimal necessary information to update and evaluate solutions in linear time without sorting. In order to investigate the performance, the new algorithm is compared with other heuristic algorithms previously available in the literature for solving DOMP. We report on some computational experiments based on the well-known N-median instances of the ORLIB with up to 900 nodes. The obtained results are comparable or superior to existing algorithms in the literature, both in running times and number of best solutions found.  相似文献   

10.
We give a fast algorithm to evaluate a class of d-dimensional integrals. A direct numerical evaluation of these integrals costs Nd, where d is the number of variables and N is the number of discrete points of each variable. The algorithm we present in this Note permits to reduce this cost from Nd to a cost of the order O(N). This recursive algorithm takes its inspiration from the well-known Fast-Multipole method. At the end of this paper we give some physical applications of such an algorithm.  相似文献   

11.
Time-complexity and space-complexity of arithmetic algorithms without divisions measured by the number of binary bits processed in computations are estimated for algorithms for discrete Fourier transform (DFT) and polynomial multiplication (PM, or convolution of vectors). It is proved that Ω(N2) bit-operations are required in an evaluation-interpolation algorithm for PM over the field of real constants while O(N log2N) arithmetic operations suffice.  相似文献   

12.
Attribute reduction is a key step to discover interesting patterns in the decision system with numbers of attributes available. In recent years, with the fast development of data processing tools, the information system may increase quickly in attributes over time. How to update attribute reducts efficiently under the attribute generalization becomes an important task in knowledge discovery related tasks since the result of attribute reduction may alter with the increase of attributes. This paper aims for investigation of incremental attribute reduction algorithm based on knowledge granularity in the decision system under the variation of attributes. Incremental mechanisms to calculate the new knowledge granularity are first introduced. Then, the corresponding incremental algorithms are presented for attribute reduction based on the calculated knowledge granularity when multiple attributes are added to the decision system. Finally, experiments performed on UCI data sets and the complexity analysis show that the proposed incremental methods are effective and efficient to update attribute reducts with the increase of attributes.  相似文献   

13.
Although support vector regression models are being used successfully in various applications, the size of the business datasets with millions of observations and thousands of variables makes training them difficult, if not impossible to solve. This paper introduces the Row and Column Selection Algorithm (ROCSA) to select a small but informative dataset for training support vector regression models with standard SVM tools. ROCSA uses ε-SVR models with L1-norm regularization of the dual and primal variables for the row and column selection steps, respectively. The first step involves parallel processing of data chunks and selects a fraction of the original observations that are either representative of the pattern identified in the chunk, or represent those observations that do not fit the identified pattern. The column selection step dramatically reduces the number of variables and the multicolinearity in the dataset, increasing the interpretability of the resulting models and their ease of maintenance. Evaluated on six retail datasets from two countries and a publicly available research dataset, the reduced ROCSA training data improves the predictive accuracy on average by 39% compared with the original dataset when trained with standard SVM tools. Comparison with the ε SSVR method using reduced kernel technique shows similar performance improvement. Training a standard SVM tool with the ROCSA selected observations improves the predictive accuracy on average by 21% compared to the practical approach of random sampling.  相似文献   

14.
Analysis of Static Simulated Annealing Algorithms   总被引:1,自引:0,他引:1  
Generalized hill climbing (GHC) algorithms provide a framework for modeling local search algorithms to address intractable discrete optimization problems. This paper introduces a measure for determining the expected number of iterations to visit a predetermined objective function level, given that an inferior objective function level has been reached in a finite number of iterations. A variation of simulated annealing (SA), termed static simulated annealing (S2A), is analyzed using this measure. S2A uses a fixed cooling schedule during the algorithm execution. Though S2A is probably nonconvergent, its finite-time performance can be assessed using the finite-time performance measure defined in this paper.  相似文献   

15.
Nonnegative matrix factorization (NMF) is a powerful technique for dimension reduction, extracting latent factors and learning part-based representation. For large datasets, NMF performance depends on some major issues such as fast algorithms, fully parallel distributed feasibility and limited internal memory. This research designs a fast fully parallel and distributed algorithm using limited internal memory to reach high NMF performance for large datasets. Specially, we propose a flexible accelerated algorithm for NMF with all its \(L_1\) \(L_2\) regularized variants based on full decomposition, which is a combination of exact line search, greedy coordinate descent, and accelerated search. The proposed algorithm takes advantages of these algorithms to converges linearly at an over-bounded rate \((1-\frac{\mu }{L})(1 - \frac{\mu }{rL})^{2r}\) in optimizing each factor matrix when fixing the other factor one in the sub-space of passive variables, where r is the number of latent components, and \(\mu \) and L are bounded as \(\frac{1}{2} \le \mu \le L \le r\). In addition, the algorithm can exploit the data sparseness to run on large datasets with limited internal memory of machines, which is is advanced compared to fast block coordinate descent methods and accelerated methods. Our experimental results are highly competitive with seven state-of-the-art methods about three significant aspects of convergence, optimality and average of the iteration numbers.  相似文献   

16.
Many dynamic programming algorithms for discrete optimization problems are pure in that they only use min/max and addition operations in their recursions. Some of them, in particular those for various shortest path problems, are even incremental in that one of the inputs to the addition operations is a variable. We present an explicit optimization problem such that it can be solved by a pure DP algorithm using a polynomial number of operations, but any incremental DP algorithm for this problem requires a super-polynomial number of operations.  相似文献   

17.
In a finite dataset consisting of positive and negative observations represented as real valued n-vectors, a positive (negative) pattern is an interval in Rn with the property that it contains sufficiently many positive (negative) observations, and sufficiently few negative (positive) ones. A pattern is spanned if it does not include properly any other interval containing the same set of observations. Although large collections of spanned patterns can provide highly accurate classification models within the framework of the Logical Analysis of Data, no efficient method for their generation is currently known. We propose in this paper, an incrementally polynomial time algorithm for the generation of all spanned patterns in a dataset, which runs in linear time in the output; the algorithm resembles closely the Blake and Quine consensus method for finding the prime implicants of Boolean functions. The efficiency of the proposed algorithm is tested on various publicly available datasets. In the last part of the paper, we present the results of a series of computational experiments which show the high degree of robustness of spanned patterns.  相似文献   

18.
We propose a method for selecting variables in latent class analysis, which is the most common model-based clustering method for discrete data. The method assesses a variable’s usefulness for clustering by comparing two models, given the clustering variables already selected. In one model the variable contributes information about cluster allocation beyond that contained in the already selected variables, and in the other model it does not. A headlong search algorithm is used to explore the model space and select clustering variables. In simulated datasets we found that the method selected the correct clustering variables, and also led to improvements in classification performance and in accuracy of the choice of the number of classes. In two real datasets, our method discovered the same group structure with fewer variables. In a dataset from the International HapMap Project consisting of 639 single nucleotide polymorphisms (SNPs) from 210 members of different groups, our method discovered the same group structure with a much smaller number of SNPs.  相似文献   

19.
In power production problems maximum power and minimum entropy production and inherently connected by the Gouy–Stodola law. In this paper various mathematical tools are applied in dynamic optimization of power-maximizing paths, with special attention paid to nonlinear systems. Maximum power and/or minimum entropy production are governed by Hamilton–Jacobi–Bellman (HJB) equations which describe the value function of the problem and associated controls. Yet, in many cases optimal relaxation curve is non-exponential, governing HJB equations do not admit classical solutions and one has to work with viscosity solutions. Systems with nonlinear kinetics (e.g. radiation engines) are particularly difficult, thus, discrete counterparts of continuous HJB equations and numerical approaches are recommended. Discrete algorithms of dynamic programming (DP), which lead to power limits and associated availabilities, are effective. We consider convergence of discrete algorithms to viscosity solutions of HJB equations, discrete approximations, and the role of Lagrange multiplier λ associated with the duration constraint. In analytical discrete schemes, the Legendre transformation is a significant tool leading to original work function. We also describe numerical algorithms of dynamic programming and consider dimensionality reduction in these algorithms. Indications showing the method potential for other systems, in particular chemical energy systems, are given.  相似文献   

20.
The present paper deals with supervised classification methods based on Galois lattices and decision trees. Such ordered structures require attributes discretization and it is known that, for decision trees, local discretization improves the classification performance compared with global discretization. While most literature on discretization for Galois lattices relies on global discretization, the presented work introduces a new local discretization algorithm for Galois lattices which hinges on a property of some specific lattices that we introduce as dichotomic lattices. Their properties, co-atomicity and \(\vee \)-complementarity are proved along with their links with decision trees. Finally, some quantitative and qualitative evaluations of the local discretization are proposed.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号