首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 656 毫秒
1.
Mining association rules is a popular and well researched method for discovering interesting relations between variables in large databases. A practical problem is that at medium to low support values often a large number of frequent itemsets and an even larger number of association rules are found in a database. A widely used approach is to gradually increase minimum support and minimum confidence or to filter the found rules using increasingly strict constraints on additional measures of interestingness until the set of rules found is reduced to a manageable size. In this paper we describe a different approach which is based on the idea to first define a set of “interesting” itemsets (e.g., by a mixture of mining and expert knowledge) and then, in a second step to selectively generate rules for only these itemsets. The main advantage of this approach over increasing thresholds or filtering rules is that the number of rules found is significantly reduced while at the same time it is not necessary to increase the support and confidence thresholds which might lead to missing important information in the database.  相似文献   

2.
为挖掘通用航空产业领域知识的类型与发展规律,明确产业规划、推动产业健康发展,利用采集的3869份通用航空产业主题相关的网页资料,根据主题信息的关键词对通用航空产业领域知识进行分类,并基于多层次模糊关联算法进行知识挖掘分析.结果表明,通用航空产业领域知识中的通用航空产业主体与通用航空产业产品存在着紧密的联系和规则,而且采...  相似文献   

3.
The progress in bioinformatics and biotechnology area has generated a huge amount of sequences that requires a detailed analysis. There are several data mining techniques that can be used to discovery patterns in large databases. This paper describes the development of a tool/methodology to extract hydrophobicity patterns/profiles that archives a specific secondary structure in proteins. The results indicate that association rules can be efficient method to investigate this kind of problem. This work contributes for two areas: prediction of protein structure and protein folding.  相似文献   

4.
Utility itemsets typically consist of items with different values such as utilities, and the aim of utility mining is to identify the itemsets with highest utilities. In the past studies on utility mining, the values of utility itemsets were considered as positive. In some applications, however, an itemset may be associated with negative item values. Hence, discovery of high utility itemsets with negative item values is important for mining interesting patterns like association rules. In this paper, we propose a novel method, namely HUINIV (High Utility Itemsets with Negative Item Values)-Mine, for efficiently and effectively mining high utility itemsets from large databases with consideration of negative item values. To the best of our knowledge, this is the first work that considers the concept of negative item values in utility mining. The novel contribution of HUINIV-Mine is that it can effectively identify high utility itemsets by generating fewer high transaction-weighted utilization itemsets such that the execution time can be reduced substantially in mining the high utility itemsets. In this way, the process of discovering all high utility itemsets with consideration of negative item values can be accomplished effectively with less requirements on memory space and CPU I/O. This meets the critical requirements of temporal and spatial efficiency for mining high utility itemsets with negative item values. Through experimental evaluation, it is shown that HUINIV-Mine outperforms other methods substantially by generating much less candidate itemsets under different experimental conditions.  相似文献   

5.
Classification and rule induction are two important tasks to extract knowledge from data. In rule induction, the representation of knowledge is defined as IF-THEN rules which are easily understandable and applicable by problem-domain experts. In this paper, a new chromosome representation and solution technique based on Multi-Expression Programming (MEP) which is named as MEPAR-miner (Multi-Expression Programming for Association Rule Mining) for rule induction is proposed. Multi-Expression Programming (MEP) is a relatively new technique in evolutionary programming that is first introduced in 2002 by Oltean and Dumitrescu. MEP uses linear chromosome structure. In MEP, multiple logical expressions which have different sizes are used to represent different logical rules. MEP expressions can be encoded and implemented in a flexible and efficient manner. MEP is generally applied to prediction problems; in this paper a new algorithm is presented which enables MEP to discover classification rules. The performance of the developed algorithm is tested on nine publicly available binary and n-ary classification data sets. Extensive experiments are performed to demonstrate that MEPAR-miner can discover effective classification rules that are as good as (or better than) the ones obtained by the traditional rule induction methods. It is also shown that effective gene encoding structure directly improves the predictive accuracy of logical IF-THEN rules.  相似文献   

6.
数据库中布尔型及广义模糊加权关联规则的挖掘   总被引:3,自引:0,他引:3  
引入布尔型加权关联规则和广义模糊加权关联规则的概念,并分别给出挖掘这些规则的计算方法.  相似文献   

7.
本文通过不确定性推理的分析,提出了模糊关联的概念,用模糊概念表示事务数据之间的关联关系,研究了模糊关联的性质,给出了模糊关联产生式的发掘算法及应用的实例.  相似文献   

8.
Storing XML documents in relational databases has drawn much attention in recent years because it can leverage existing investments in relational database technologies. Different algorithms have been proposed to map XML DTD/Schema to relational schema in order to store XML data in relational databases. However, most work defines mapping rules based on heuristics without considering application characteristics, hence fails to produce efficient relational schema for various applications. In this paper, we propose a workload-aware approach to generate relational schema from XML data and user specified workload. Our approach adopts the genetic algorithm to find optimal mappings. An elegant encoding method and related operations are proposed to manipulate mappings using bit strings. Various techniques for optimization can be applied to the XML to relational mapping problem based on this representation. We implemented the proposed algorithm and our experiment results showed that our algorithm was more robust and produced better mappings than existing work.  相似文献   

9.
This paper proposes a novel Informed Evolutionary algorithm (InEA) which implements the idea of learning with a generation. An association rule miner is used to identify the norm of a population. Subsequently, a knowledge based mutation operator is used to help guide the search of the evolutionary optimizer. The approach breaks away from the current practice of treating the optimization and analysis process as two independent processes. It shows how a rule mining module can be used to mine knowledge and hybridized into EA to improve the performance of the optimizer. The proposed memetic algorithm is examined via various benchmarks problems, and the simulation results show that InEA is competitive as compared to existing approaches in literature.  相似文献   

10.
CPI指数变换对产品销售影响的可拓数据挖掘   总被引:2,自引:0,他引:2  
目前对数据挖掘的研究主要集中在对静态数据的挖掘,而在实际工作中,经常要处理的矛盾问题,需要通过可拓变换和可拓变换的运算来解决,这就需要用到变换的知识,需要运用动态数据挖掘或可拓数据挖掘来解决问题.运用可拓逻辑和可拓数据挖掘的理论知识,根据国家消费者物价指数的变换对产品销售数据的影响来研究可拓数据挖掘中传导知识的挖掘,为企业的决策者在目前的市场环境下提出更加合理的销售策略提供依据.  相似文献   

11.
12.
13.
Applying classical association rule extraction framework on fuzzy datasets leads to an unmanageably highly sized association rule sets. Moreover, the discretization operation leads to information loss and constitutes a hamper towards an efficient exploitation of the mined knowledge. To overcome such a drawback, this paper proposes the extraction and the exploitation of compact and informative generic basis of fuzzy association rules. The presented approach relies on the extension, within the fuzzy context, of the notion of closure and Galois connection, that we introduce in this paper. In order to select without loss of information a generic subset of all fuzzy association rules, we define three fuzzy generic basis from which remaining (redundant) FARs are generated. This generic basis constitutes a compact nucleus of fuzzy association rules, from which it is possible to informatively derive all the remaining rules. In order to ensure a sound and complete derivation process, we introduce an axiomatic system allowing the complete derivation of all the redundant rules. The results obtained from experiments carried out on benchmark datasets are very encouraging. They highlight a very important reduction of the number of the extracted fuzzy association rules without information loss.  相似文献   

14.
15.
This paper presents an application of knowledge discovery via rough sets to a real life case study of global investing risk in 52 countries using 27 indicator variables. The aim is explanation of the classification of the countries according to financial risks assessed by Wall Street Journal international experts and knowledge discovery from data via decision rule mining, rather than prediction; i.e. to capture the explicit or implicit knowledge or policy of international financial experts, rather than to predict the actual classifications. Suggestions are made about the most significant attributes for each risk class and country, as well as the minimal set of decision rules needed. Our results compared favorably with those from discriminant analysis and several variations of preference disaggregation MCDA procedures. The same approach could be adapted to other problems with missing data in data mining, knowledge extraction, and different multi-criteria decision problems, like sorting, choice and ranking.  相似文献   

16.
In this paper, we present an application of multi-objective metaheuristics to the field of data mining. We introduce the data mining task of nugget discovery (also known as partial classification) and show how the multi-objective metaheuristic algorithm NSGA II can be modified to solve this problem. We also present an alternative algorithm for the same task, the ARAC algorithm, which can find all rules that are best according to some measures of interest subject to certain constraints. The ARAC algorithm provides an excellent basis for comparison with the results of the multi-objective metaheuristic algorithm as it can deliver the Pareto optimal front consisting of all partial classification rules that lie in the upper confidence/coverage border, for databases of limited size. We present the results of experiments with various well-known databases for both algorithms. We also discuss how the two methods can be used complementarily for large databases to deliver a set of best rules according to some predefined criteria, providing a powerful tool for knowledge discovery in databases.  相似文献   

17.
Rough set theory is a new data mining approach to manage vagueness. It is capable to discover important facts hidden in the data. Literature indicate the current rough set based approaches can’t guarantee that classification of a decision table is credible and it is not able to generate robust decision rules when new attributes are incrementally added in. In this study, an incremental attribute oriented rule-extraction algorithm is proposed to solve this deficiency commonly observed in the literature related to decision rule induction. The proposed approach considers incremental attributes based on the alternative rule extraction algorithm (AREA), which was presented for discovering preference-based rules according to the reducts with the maximum of strength index (SI), specifically the case that the desired reducts are not necessarily unique since several reducts could include the same value of SI. Using the AREA, an alternative rule can be defined as the rule which holds identical preference to the original decision rule and may be more attractive to a decision-maker than the original one. Through implementing the proposed approach, it can be effectively operating with new attributes to be added in the database/information systems. It is not required to re-compute the updated data set similar to the first step at the initial stage. The proposed algorithm also excludes these repetitive rules during the solution search stage since most of the rule induction approaches generate the repetitive rules. The proposed approach is capable to efficiently and effectively generate the complete, robust and non-repetitive decision rules. The rules derived from the data set provide an indication of how to effectively study this problem in further investigations.  相似文献   

18.
Incomplete decision contexts are a kind of decision formal contexts in which information about the relationship between some objects and attributes is not available or is lost. Knowledge discovery in incomplete decision contexts is of interest because such databases are frequently encountered in the real world. This paper mainly focuses on the issues of approximate concept construction, rule acquisition and knowledge reduction in incomplete decision contexts. We propose a novel method for building the approximate concept lattice of an incomplete context. Then, we present the notion of an approximate decision rule and an approach for extracting non-redundant approximate decision rules from an incomplete decision context. Furthermore, in order to make the rule acquisition easier and the extracted approximate decision rules more compact, a knowledge reduction framework with a reduction procedure for incomplete decision contexts is formulated by constructing a discernibility matrix and its associated Boolean function. Finally, some numerical experiments are conducted to assess the efficiency of the proposed method.  相似文献   

19.
数据挖掘是指从大型数据库的海量信息中有效进行知识发现的过程,而其效能的高低主要取决于搜索机制所依据的算法.有鉴于此,提出了一种基于个体免疫与群体进化机制于一体的一种高效的全局优化搜索算法,即基于免疫规划的广义规则推理算法.与已有算法所不同的是,广义规则推理算法不仅仅着眼于发现一些有关分类方面的信息,而是利用背景理论和先验知识在知识表示与运行效率之间相均衡的基础上,着重新知识的发现和对高级规则的预测.理论分析和仿真实验表明,广义规则推理算法有利于进化群体的相对稳定和整体性能的提高,并可以在规则提取过程中保持较高的精确度.  相似文献   

20.
In this paper, we propose a novel method to mine association rules for classification problems namely AFSRC (AFS association rules for classification) realized in the framework of the axiomatic fuzzy set (AFS) theory. This model provides a simple and efficient rule generation mechanism. It can also retain meaningful rules for imbalanced classes by fuzzifying the concept of the class support of a rule. In addition, AFSRC can handle different data types occurring simultaneously. Furthermore, the new model can produce membership functions automatically by processing available data. An extensive suite of experiments are reported which offer a comprehensive comparison of the performance of the method with the performance of some other methods available in the literature. The experimental result shows that AFSRC outperforms most of other methods when being quantified in terms of accuracy and interpretability. AFSRC forms a classifier with high accuracy and more interpretable rule base of smaller size while retaining a sound balance between these two characteristics.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号