首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Finding interesting patterns from data is one of the most important problems in data mining and it has been studied actively for more than a decade. However, it is still largely open problem which patterns are interesting and which are not.The problem of detecting the interesting patterns (in a predefined class of patterns) has been attempted to solve by determining quality values for potentially interesting patterns and deciding a pattern to be interesting if its quality value (i.e., the interestingness of the pattern) is higher than a given threshold value. Again, it is very difficult to find a threshold value and a way to determine the quality values such that the collection of patterns with quality values greater than the threshold value would contain almost all truly interesting patterns and only few uninteresting ones.To enable more accurate characterization of interesting patterns, use of constraints to further prune the pattern collection has been proposed. However, most of the constrained pattern discovery research has been focused on structural constraints for the pattern collections and the patterns. We take a complementary approach and focus on constraining the quality values of the patterns.We propose quality value simplifications as a complementary approach to structural constraints on patterns. As a special case of the quality value simplifications, we consider discretizing the quality values. We analyze the worst-case error of certain discretization functions and give efficient discretization algorithms minimizing several loss functions. In addition to that, we show that the discretizations of the quality values can be used to obtain small approximate condensed representations for collections of interesting patterns. We evaluate the proposed condensation approach experimentally using frequent itemsets.  相似文献   

2.
The fast pace in the development of indoor sensors and communication technologies is allowing a great amount of sensor data to be utilized in various areas of indoor air applications, such as estimating indoor airflow patterns. The development of such an inverse model and the design of a sensor system to collect appropriate data are discussed in this study. Algebraic approaches, including singular value decomposition (SVD), are evaluated as methods to inversely estimate airflow patterns given limited sensor measurements. In lieu of actual sensor data, computational fluid dynamics data are used to evaluate the accuracy of the airflow patterns estimated by the inverse models developed in this study. It was found that the airflow patterns estimated by the linear inverse SVD model were as accurate as those estimated by the nonlinear inverse-multizone model. For the zones tested, sensor measurements along on the walls and near the inlet and outlet provided the greatest improvement in the accuracy of the estimated airflow patterns when compared with the results using measurements from other locations.  相似文献   

3.
基于贝叶斯统计方法的两总体基因表达数据分类   总被引:1,自引:0,他引:1  
在疾病的诊断过程中,对疾病的精确分类是提高诊断准确率和疾病治愈率至 关重要的一个环节,DNA芯片技术的出现使得我们从微观的层次获得与疾病分类及诊断 密切相关的基因功能信息.但是DNA芯片技术得到的基因的表达模式数据具有多变量小 样本特点,使得分类过程极不稳定,因此我们首先筛选出表达模式发生显著性变化的基因 作为特征基因集合以减少变量个数,然后再根据此特征基因集合建立分类器对样本进行分 类.本文运用似然比检验筛选出特征基因,然后基于贝叶斯方法建立了统计分类模型,并 应用马尔科夫链蒙特卡罗(MCMC)抽样方法计算样本归类后验概率.最后我们将此模型 应用到两组真实的DNA芯片数据上,并将样本成功分类.  相似文献   

4.
5.
The allowed patterns of a map on a one-dimensional interval are those permutations that are realized by the relative order of the elements in its orbits. The set of allowed patterns is completely determined by the minimal patterns that are not allowed. These are called basic forbidden patterns.In this paper, we study basic forbidden patterns of several functions. We show that the logistic map Lr(x)=rx(1−x) and some generalizations have infinitely many of them for 1<r≤4, and we give a lower bound on the number of basic forbidden patterns of L4 of each length. Next, we give an upper bound on the length of the shortest forbidden pattern of a piecewise monotone map. Finally, we provide some necessary conditions for a set of permutations to be the set of basic forbidden patterns of such a map.  相似文献   

6.
We extend the classification of pattern types by Grünbaum and Shephard to the 2-sided plane to include patterns having layered or interlaced motifs. Such patterns may have symmetries that turn the plane over. There are 17 infinite families of pattern types for 2-sided rosettes (twelve 1-parameter families and five 2-parameter families), 68 types of 2-sided frieze pattern, and 264 types of 2-sided periodic pattern. The definition of ‘henomeric’ is clarified to ensure that two of the periodic patterns are distinguished.   相似文献   

7.
Based on the Adaboost algorithm, a modified boosting method is proposed in this paper for solving classification problems. This method predicts the class label of an example as the weighted majority voting of an ensemble of classifiers. Each classifier is obtained by applying a given weak learner to a subsample (with size smaller than that of the original training set) which is drawn from the original training set according to the probability distribution maintained over the training set. A parameter is introduced into the reweighted scheme proposed in Adaboost to update the probabilities assigned to training examples so that the algorithm can be more accurate than Adaboost. The experimental results on synthetic and several real-world data sets available from the UCI repository show that the proposed method improves the prediction accuracy, the execution speed as well as the robustness to classification noise of Adaboost. Furthermore, the diversity–accuracy patterns of the ensemble classifiers are investigated by kappa–error diagrams.  相似文献   

8.
Classification on high-dimensional data with thousands to tens of thousands of dimensions is a challenging task due to the high dimensionality and the quality of the feature set. The problem can be addressed by using feature selection to choose only informative features or feature construction to create new high-level features. Genetic programming (GP) using a tree-based representation can be used for both feature construction and implicit feature selection. This work presents a comprehensive study to investigate the use of GP for feature construction and selection on high-dimensional classification problems. Different combinations of the constructed and/or selected features are tested and compared on seven high-dimensional gene expression problems, and different classification algorithms are used to evaluate their performance. The results show that the constructed and/or selected feature sets can significantly reduce the dimensionality and maintain or even increase the classification accuracy in most cases. The cases with overfitting occurred are analysed via the distribution of features. Further analysis is also performed to show why the constructed feature can achieve promising classification performance.  相似文献   

9.
A tool to study the inertias of reducible nonzero (resp. sign) patterns is presented. Sumsets are used to obtain a list of inertias attainable by the pattern 𝒜 ⊕ ? dependent upon inertias attainable by patterns 𝒜 and ?. It is shown that if ? is a pattern of order n, and 𝒜 is an inertially arbitrary pattern of order at least 2(n ? 1), then 𝒜 ⊕ ? is inertially arbitrary if and only if ? allows the inertias (0, 0, n), (0, n, 0) and (n, 0, 0). We illustrate how to construct other reducible inertially (resp. spectrally) arbitrary patterns from an inertially (resp. spectrally) arbitrary pattern 𝒜 ⊕ ?, by replacing 𝒜 with an inertially (resp. spectrally) arbitrary pattern 𝒮. We identify reducible inertially (resp. spectrally) arbitrary patterns of the smallest orders that contain some irreducible components that are not inertially (resp. spectrally) arbitrary. It is shown there exist nonzero (resp. sign) patterns 𝒜 and ? of orders 4 and 5 (resp. 4 and 4) such that both 𝒜 and ? are non-inertially-arbitrary, and 𝒜 ⊕ ? is inertially arbitrary.  相似文献   

10.
The paper deals with multiclass learning from the perspective of analytically interpreting the results of the analysis as well as that of navigating into them by using interactive visualization tools. It is showed that by combining the Sequential Automatic Search of Subset of Classifiers (SASSC) algorithm with the interactive visualization of classification trees provided by the Klassification—Interactive Methods for Trees (KLIMT) software it is possible to highlight important information deriving from the knowledge extraction process without neglecting the prediction accuracy of the classification method. Empirical evidence from two benchmark datasets demonstrates the advantages deriving from the joint use of SASSC and KLIMT.  相似文献   

11.
A fundamental problem in classification is how to combine collections of trees having overlapping sets of leaves. The requirement that such a collection of trees is realized by at least one parent tree determines uniquely some additional subtrees not in the original collection. We analyze the "rules" that arise in this way by defining a closure operator for sets of trees. In particular we show that there exist rules of arbitrarily high order which cannot be reduced to repeated application of lower-order rules.  相似文献   

12.
In a finite dataset consisting of positive and negative observations represented as real valued n-vectors, a positive (negative) pattern is an interval in Rn with the property that it contains sufficiently many positive (negative) observations, and sufficiently few negative (positive) ones. A pattern is spanned if it does not include properly any other interval containing the same set of observations. Although large collections of spanned patterns can provide highly accurate classification models within the framework of the Logical Analysis of Data, no efficient method for their generation is currently known. We propose in this paper, an incrementally polynomial time algorithm for the generation of all spanned patterns in a dataset, which runs in linear time in the output; the algorithm resembles closely the Blake and Quine consensus method for finding the prime implicants of Boolean functions. The efficiency of the proposed algorithm is tested on various publicly available datasets. In the last part of the paper, we present the results of a series of computational experiments which show the high degree of robustness of spanned patterns.  相似文献   

13.
The focus of this article is on various approaches to discerning patterns in nonempty sets endowed with a proximity (nearness) relation. Patterns arise in repetitions of some form in the arrangement of the parts of a set. To simplify the steps leading to pattern discovery, an approach inspired by M. Kat?tov is used, where one proximitises certain parts of a nonempty set, rather than proximitise the whole set. In effect, this is a divide-and-conquer approach to pattern discovery. This leads to a study of patterns that are collections of near sets. An important practical outcome of this approach is the discovery of patterns in proximity spaces.  相似文献   

14.
The amalgamation of leaf-labeled trees into a single (super)tree that “displays” each of the input trees is an important problem in classification. We discuss various approaches to this problem and show that a simple and well-known polynomial-time algorithm can be used to solve this problem whenever the input set of trees contains a minimum size subset that uniquely determines the supertree. Our results exploit a recently established combinatorial property concerning the structure of such collections of trees.  相似文献   

15.
The use of Gaussian quadrature formulae is explored for the computation of the Macdonald function (modified Bessel function) of complex orders and positive arguments. It is shown that for arguments larger than one, Gaussian quadrature applied to the integral representation of this function is a viable approach, provided the (nonclassical) weight function is suitably chosen. In combination with Gauss–Legendre quadrature the approach works also for arguments smaller than one. For very small arguments, power series can be used. A Matlab routine is provided that implements this approach. AMS subject classification (2000) 33-04, 33C10, 65D15, 65D32  相似文献   

16.
本文研究了一类发生在密闭容器中的不可激活的高次自催化反应扩散系统.在适当的条件下,用渐进近似的方法讨论了系统平衡态的稳定范围;用多重尺度的方法证明了当扩散系数λ充分小时,系统出现两种类型的斑图,一类是由Hopf分歧引出的驻波斑图;另一类是由 Pitchfork分歧引出的定波斑图.进一步还讨论了,在分歧点附近,对于大于空间或等于空间波数的小扰动,斑图是局部稳定的,而小于自身空间波数的小扰动,斑图是不稳定的.  相似文献   

17.
张磊  李慧民 《经济数学》2011,28(2):107-110
在分析影响建筑业综合实力因素的基础上,选择了规模效益水平、技术装备水平、效率水平三个方面共12个评判指标,建立了建筑业综合实力评价的投影寻踪分类模型(PPC),并利用基于实数编码的加速遗传算法(RAGA)求解最佳投影方向.根据投影值的大小对地区建筑业综合实力进行评价分析.PPC模型具有稳健性好、投影值准确度高、评价结果...  相似文献   

18.
We investigate a migration-selection system arising from CRISPR-Cas9 genetic engineering, which describes the evolution of the frequencies of a wild allele O, a drive allele D, and a brake allele B. The purpose is to see whether the drive allele D can persist in the population and whether its spread can be limited or stopped by the brake allele B when necessary. We give a complete classification of the dynamics of this system when there is no migration. We further show that migration may cause complex spatiotemporal patterns by demonstrating the existence of spatially inhomogeneous periodic solutions and steady state solutions.  相似文献   

19.
Spare parts are known to be associated with intermittent demand patterns and such patterns cause considerable problems with regards to forecasting and stock control due to their compound nature that renders the normality assumption invalid. Compound distributions have been used to model intermittent demand patterns; there is however a lack of theoretical analysis and little relevant empirical evidence in support of these distributions. In this paper, we conduct a detailed empirical investigation on the goodness of fit of various compound Poisson distributions and we develop a distribution-based demand classification scheme the validity of which is also assessed in empirical terms. Our empirical investigation provides evidence in support of certain demand distributions and the work described in this paper should facilitate the task of selecting such distributions in a real world spare parts inventory context. An extensive discussion on parameter estimation related difficulties in this area is also provided.  相似文献   

20.
Numerical computations often show that the Gierer-Meinhardt system has stable solutions which display patterns of multiple interior peaks (often also called spots). These patterns are also frequently observed in natural biological systems. It is assumed that the diffusion rate of the activator is very small and the diffusion rate of the inhibitor is finite (this is the so-called strong-coupling case). In this paper, we rigorously establish the existence and stability of such solutions of the full Gierer-Meinhardt system in two dimensions far from homogeneity. Green's function together with its derivatives plays a major role.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号