首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 562 毫秒
1.
As an extension of Pawlak rough set model, decision-theoretic rough set model (DTRS) adopts the Bayesian decision theory to compute the required thresholds in probabilistic rough set models. It gives a new semantic interpretation of the positive, boundary and negative regions by using three-way decisions. DTRS has been widely discussed and applied in data mining and decision making. However, one limitation of DTRS is its lack of ability to deal with numerical data directly. In order to overcome this disadvantage and extend the theory of DTRS, this paper proposes a neighborhood based decision-theoretic rough set model (NDTRS) under the framework of DTRS. Basic concepts of NDTRS are introduced. A positive region related attribute reduct and a minimum cost attribute reduct in the proposed model are defined and analyzed. Experimental results show that our methods can get a short reduct. Furthermore, a new neighborhood classifier based on three-way decisions is constructed and compared with other classifiers. Comparison experiments show that the proposed classifier can get a high accuracy and a low misclassification cost.  相似文献   

2.
Feature selection is a challenging problem in many areas such as pattern recognition, machine learning and data mining. Rough set theory, as a valid soft computing tool to analyze various types of data, has been widely applied to select helpful features (also called attribute reduction). In rough set theory, many feature selection algorithms have been developed in the literatures, however, they are very time-consuming when data sets are in a large scale. To overcome this limitation, we propose in this paper an efficient rough feature selection algorithm for large-scale data sets, which is stimulated from multi-granulation. A sub-table of a data set can be considered as a small granularity. Given a large-scale data set, the algorithm first selects different small granularities and then estimate on each small granularity the reduct of the original data set. Fusing all of the estimates on small granularities together, the algorithm can get an approximate reduct. Because of that the total time spent on computing reducts for sub-tables is much less than that for the original large-scale one, the algorithm yields in a much less amount of time a feature subset (the approximate reduct). According to several decision performance measures, experimental results show that the proposed algorithm is feasible and efficient for large-scale data sets.  相似文献   

3.
Attribute reduction is very important in rough set-based data analysis (RSDA) because it can be used to simplify the induced decision rules without reducing the classification accuracy. The notion of reduct plays a key role in rough set-based attribute reduction. In rough set theory, a reduct is generally defined as a minimal subset of attributes that can classify the same domain of objects as unambiguously as the original set of attributes. Nevertheless, from a relational perspective, RSDA relies on a kind of dependency principle. That is, the relationship between the class labels of a pair of objects depends on component-wise comparison of their condition attributes. The larger the number of condition attributes compared, the greater the probability that the dependency will hold. Thus, elimination of condition attributes may cause more object pairs to violate the dependency principle. Based on this observation, a reduct can be defined alternatively as a minimal subset of attributes that does not increase the number of objects violating the dependency principle. While the alternative definition coincides with the original one in ordinary RSDA, it is more easily generalized to cases of fuzzy RSDA and relational data analysis.  相似文献   

4.
In this paper, we present two classification approaches based on Rough Sets (RS) that are able to learn decision rules from uncertain data. We assume that the uncertainty exists only in the decision attribute values of the Decision Table (DT) and is represented by the belief functions. The first technique, named Belief Rough Set Classifier (BRSC), is based only on the basic concepts of the Rough Sets (RS). The second, called Belief Rough Set Classifier, is more sophisticated. It is based on Generalization Distribution Table (BRSC-GDT), which is a hybridization of the Generalization Distribution Table and the Rough Sets (GDT-RS). The two classifiers aim at simplifying the Uncertain Decision Table (UDT) in order to generate significant decision rules for classification process. Furthermore, to improve the time complexity of the construction procedure of the two classifiers, we apply a heuristic method of attribute selection based on rough sets. To evaluate the performance of each classification approach, we carry experiments on a number of standard real-world databases by artificially introducing uncertainty in the decision attribute values. In addition, we test our classifiers on a naturally uncertain web usage database. We compare our belief rough set classifiers with traditional classification methods only for the certain case. Besides, we compare the results relative to the uncertain case with those given by another similar classifier, called the Belief Decision Tree (BDT), which also deals with uncertain decision attribute values.  相似文献   

5.
Rough set theory is a useful mathematical tool to deal with vagueness and uncertainty in available information. The results of a rough set approach are usually presented in the form of a set of decision rules derived from a decision table. Because using the original decision table is not the only way to implement a rough set approach, it could be interesting to investigate possible improvement in classification performance by replacing the original table with an alternative table obtained by pairwise comparisons among patterns. In this paper, a decision table based on pairwise comparisons is generated using the preference relation as in the Preference Ranking Organization Methods for Enrichment Evaluations (PROMETHEE) methods, to gauges the intensity of preference for one pattern over another pattern on each criterion before classification. The rough-set-based rule classifier (RSRC) provided by the well-known library for the Rough Set Exploration System (RSES) running under Windows as been successfully used to generate decision rules by using the pairwise-comparisons-based tables. Specifically, parameters related to the preference function on each criterion have been determined using a genetic-algorithm-based approach. Computer simulations involving several real-world data sets have revealed that of the proposed classification method performs well compared to other well-known classification methods and to RSRC using the original tables.  相似文献   

6.
In rough set theory, attribute reduction is a challenging problem in the applications in which data with numbers of attributes available. Moreover, due to dynamic characteristics of data collection in decision systems, attribute reduction will change dynamically as attribute set in decision systems varies over time. How to carry out updating attribute reduction by utilizing previous information is an important task that can help to improve the efficiency of knowledge discovery. In view of that attribute reduction algorithms in incomplete decision systems with the variation of attribute set have not yet been discussed so far. This paper focuses on positive region-based attribute reduction algorithm to solve the attribute reduction problem efficiently in the incomplete decision systems with dynamically varying attribute set. We first introduce an incremental manner to calculate the new positive region and tolerance classes. Consequently, based on the calculated positive region and tolerance classes, the corresponding attribute reduction algorithms on how to compute new attribute reduct are put forward respectively when an attribute set is added into and deleted from the incomplete decision systems. Finally, numerical experiments conducted on different data sets from UCI validate the effectiveness and efficiency of the proposed algorithms in incomplete decision systems with the variation of attribute set.  相似文献   

7.
8.
Among the large amount of genes presented in microarray gene expression data, only a small fraction of them is effective for performing a certain diagnostic test. In this regard, a new feature selection algorithm is presented based on rough set theory. It selects a set of genes from microarray data by maximizing the relevance and significance of the selected genes. A theoretical analysis is presented to justify the use of both relevance and significance criteria for selecting a reduced gene set with high predictive accuracy. The importance of rough set theory for computing both relevance and significance of the genes is also established. The performance of the proposed algorithm, along with a comparison with other related methods, is studied using the predictive accuracy of K-nearest neighbor rule and support vector machine on five cancer and two arthritis microarray data sets. Among seven data sets, the proposed algorithm attains 100% predictive accuracy for three cancer and two arthritis data sets, while the rough set based two existing algorithms attain this accuracy only for one cancer data set.  相似文献   

9.
A classification method, which comprises Fuzzy C-Means method, a modified form of the Huang-index function and Variable Precision Rough Set (VPRS) theory, is proposed for classifying labeled/unlabeled data sets in this study. This proposed method, designated as the MVPRS-index method, is used to partition the values of per conditional attribute within the data set and to achieve both the optimal number of clusters and the optimal accuracy of VPRS classification. The validity of the proposed approach is confirmed by comparing the classification results obtained from the MVPRS-index method for UCI data sets and a typical stock market data set with those obtained from the supervised neural networks classification method. Overall, the results show that the MVPRS-index method could be applied to data sets not only with labeled information but also with unlabeled information, and therefore provides a more reliable basis for the extraction of decision-making rules of labeled/unlabeled datasets.  相似文献   

10.
基于区间值直觉模糊相容关系,给出了双论域上的区间值直觉模糊粗糙集模型并讨论了其相关性质,为粗糙集的应用提供了新的理论基础与操作手段。最后,通过一个例子阐述了本文提出的区间值直觉模糊粗糙集模型在临床诊断系统中的具体应用。  相似文献   

11.
陶朝杰  杨进 《经济数学》2020,37(3):214-220
虚假评论是电商发展过程中一个无法避免的难题. 针对在线评论数据中样本类别不平衡情况,提出基于BalanceCascade-GBDT算法的虚假评论识别方法. BalanceCascade算法通过设置分类器的误报率逐步缩小大类样本空间,然后集成所有基分类器构建最终分类器. GBDT以其高准确性和可解释性被广泛应用于分类问题中,并且作为样本扰动不稳定算法,是十分合适的基分类模型. 模型基于Yelp评论数据集,采用AUC值作为评价指标,并与逻辑回归、随机森林以及神经网络算法进行对比,实验证明了该方法的有效性.  相似文献   

12.
The soft set theory, originally proposed by Molodtsov, can be used as a general mathematical tool for dealing with uncertainty. Since its appearance, there has been some progress concerning practical applications of soft set theory, especially the use of soft sets in decision making. The intuitionistic fuzzy soft set is a combination of an intuitionistic fuzzy set and a soft set. The rough set theory is a powerful tool for dealing with uncertainty, granuality and incompleteness of knowledge in information systems. Using rough set theory, this paper proposes a novel approach to intuitionistic fuzzy soft set based decision making problems. Firstly, by employing an intuitionistic fuzzy relation and a threshold value pair, we define a new rough set model and examine some fundamental properties of this rough set model. Then the concepts of approximate precision and rough degree are given and some basic properties are discussed. Furthermore, we investigate the relationship between intuitionistic fuzzy soft sets and intuitionistic fuzzy relations and present a rough set approach to intuitionistic fuzzy soft set based decision making. Finally, an illustrative example is employed to show the validity of this rough set approach in intuitionistic fuzzy soft set based decision making problems.  相似文献   

13.
Classical rough set theory is based on the conventional indiscernibility relation. It is not suitable for analyzing incomplete information. Some successful extended rough set models based on different non-equivalence relations have been proposed. The data-driven valued tolerance relation is such a non-equivalence relation. However, the calculation method of tolerance degree has some limitations. In this paper, known same probability dominant valued tolerance relation is proposed to solve this problem. On this basis, an extended rough set model based on known same probability dominant valued tolerance relation is presented. Some properties of the new model are analyzed. In order to compare the classification performance of different generalized indiscernibility relations, based on the category utility function in cluster analysis, an incomplete category utility function is proposed, which can measure the classification performance of different generalized indiscernibility relations effectively. Experimental results show that the known same probability dominant valued tolerance relation can get better classification results than other generalized indiscernibility relations.  相似文献   

14.
Rough set theory provides a powerful tool for dealing with uncertainty in data. Application of variety of rough set models to mining data stored in a single table has been widely studied. However, analysis of data stored in a relational structure using rough sets is still an extensive research area. This paper proposes compound approximation spaces and their constrained versions that are intended for handling uncertainty in relational data. The proposed spaces are expansions of tolerance approximation ones to a relational case. Compared with compound approximation spaces, the constrained version enables to derive new knowledge from relational data. The proposed approach can improve mining relational data that is uncertain, incomplete, or inconsistent.  相似文献   

15.
提出基于分辨矩阵的求覆盖粗糙集约简与核的方法,在Zakowski提出的覆盖粗糙集模型的基础上,利用分辨矩阵的一些性质,把文献[10]中的粗糙集理论中的约简与求核方法应用到基于覆盖的粗糙集理论中,既简化了覆盖粗糙集理论中的约简与求核过程,又推广了文献[10]的方法,最后举例说明此方法的有效性。  相似文献   

16.
In many classification applications and face recognition tasks, there exist unlabelled data available for training along with labelled samples. The use of unlabelled data can improve the performance of a classifier. In this paper, a semi-supervised growing neural gas is proposed for learning with such partly labelled datasets in face recognition applications. The classifier is first trained on the labelled data and then gradually unlabelled data is classified and added to the training data. The classifier is retrained; and so on. The proposed iterative algorithm conforms to the EM framework and is demonstrated, on both artificial and real datasets, to significantly boost the classification rate with the use of unlabelled data. The improvement is particularly great when the labelled dataset is small. Comparison with support vector machine classifiers is also given. The algorithm is computationally efficient and easy to implement.  相似文献   

17.
覆盖广义粗糙集的模糊性   总被引:5,自引:0,他引:5  
在研究覆盖广义粗糙集的基础上,利用两个距离函数Hamming和Euclidean距离函数,结合模糊集的最近寻常集,引入了覆盖广义粗糙集模糊度的概念,给出了一种模糊度计算方法,并证明了该模糊度的一些重要性质。这些结果在覆盖广义粗糙集的理论研究和应用都发挥着一定作用。  相似文献   

18.
Rough set feature selection (RSFS) can be used to improve classifier performance. RSFS removes redundant attributes whilst retaining important ones that preserve the classification power of the original dataset. Reducts are feature subsets selected by RSFS. Core is the intersection of all the reducts of a dataset. RSFS can only handle discrete attributes, hence, continuous attributes need to be discretized before being input to RSFS. Discretization determines the core size of a discrete dataset. However, current discretization methods do not consider the core size during discretization. Earlier work has proposed core-generating approximate minimum entropy discretization (C-GAME) algorithm which selects the maximum number of minimum entropy cuts capable of generating a non-empty core within a discrete dataset. The contributions of this paper are as follows: (1) the C-GAME algorithm is improved by adding a new type of constraint to eliminate the possibility that only a single reduct is present in a C-GAME-discrete dataset; (2) performance evaluation of C-GAME in comparison to C4.5, multi-layer perceptrons, RBF networks and k-nearest neighbours classifiers on ten datasets chosen from the UCI Machine Learning Repository; (3) performance evaluation of C-GAME in comparison to Recursive Minimum Entropy Partition (RMEP), Chimerge, Boolean Reasoning and Equal Frequency discretization algorithms on the ten datasets; (4) evaluation of the effects of C-GAME and the other four discretization methods on the sizes of reducts; (5) an upper bound is defined on the total number of reducts within a dataset; (6) the effects of different discretization algorithms on the total number of reducts are analysed; (7) performance analysis of two RSFS algorithms (a genetic algorithm and Johnson’s algorithm).  相似文献   

19.
受推荐系统在电子商务领域重大经济利益的驱动,恶意用户以非法牟利为目的实施托攻击,操纵改变推荐结果,使推荐系统面临严峻的信息安全威胁,如何识别和检测托攻击成为保障推荐系统信息安全的关键。传统支持向量机(SVM)方法同时受到小样本和数据不均衡两个问题的制约。为此,提出一种半监督SVM和非对称集成策略相结合的托攻击检测方法。首先训练初始SVM,然后引入K最近邻法优化分类面附近样本的标记质量,利用标记数据和未标记数据的混合样本集减少对标记数据的需求。最后,设计一种非对称加权集成策略,重点关注攻击样本的分类准确率,降低集成分类器对数据不均衡的敏感性。实验结果表明,本文方法有效地解决了小样本问题和数据不均衡分布问题,获得了较好的检测效果。  相似文献   

20.
Recently, a multigranulation rough set (MGRS) has become a new direction in rough set theory, which is based on multiple binary relations on the universe. However, it is worth noticing that the original MGRS can not be used to discover knowledge from information systems with various domains of attributes. In order to extend the theory of MGRS, the objective of this study is to develop a so-called neighborhood-based multigranulation rough set (NMGRS) in the framework of multigranulation rough sets. Furthermore, by using two different approximating strategies, i.e., seeking common reserving difference and seeking common rejecting difference, we first present optimistic and pessimistic 1-type neighborhood-based multigranulation rough sets and optimistic and pessimistic 2-type neighborhood-based multigranulation rough sets, respectively. Through analyzing several important properties of neighborhood-based multigranulation rough sets, we find that the new rough sets degenerate to the original MGRS when the size of neighborhood equals zero. To obtain covering reducts under neighborhood-based multigranulation rough sets, we then propose a new definition of covering reduct to describe the smallest attribute subset that preserves the consistency of the neighborhood decision system, which can be calculated by Chen’s discernibility matrix approach. These results show that the proposed NMGRS largely extends the theory and application of classical MGRS in the context of multiple granulations.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号