首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Sequential pattern mining from sequence databases has been recognized as an important data mining problem with various applications. Items in a sequence database can be organized into a concept hierarchy according to taxonomy. Based on the hierarchy, sequential patterns can be found not only at the leaf nodes (individual items) of the hierarchy, but also at higher levels of the hierarchy; this is called multiple-level sequential pattern mining. In previous research, taxonomies based on crisp relationships between any two disjointed levels, however, cannot handle the uncertainties and fuzziness in real life. For example, Tomatoes could be classified into the Fruit category, but could be also regarded as the Vegetable category. To deal with the fuzzy nature of taxonomy, Chen and Huang developed a novel knowledge discovering model to mine fuzzy multi-level sequential patterns, where the relationships from one level to another can be represented by a value between 0 and 1. In their work, a generalized sequential patterns (GSP)-like algorithm was developed to find fuzzy multi-level sequential patterns. This algorithm, however, faces a difficult problem since the mining process may have to generate and examine a huge set of combinatorial subsequences and requires multiple scans of the database. In this paper, we propose a new efficient algorithm to mine this type of pattern based on the divide-and-conquer strategy. In addition, another efficient algorithm is developed to discover fuzzy cross-level sequential patterns. Since the proposed algorithm greatly reduces the candidate subsequence generation efforts, the performance is improved significantly. Experiments show that the proposed algorithm is much more efficient and scalable than the previous one. In mining real-life databases, our works enhance the model's practicability and could promote more applications in business.  相似文献   

2.
In the last decade, the problem of getting a consensus group ranking from all users’ ranking data has received increased attention due to its widespread applications. Previous research solved this problem by consolidating the opinions of all users, thereby obtaining an ordering list of all items that represent the achieved consensus. The weakness of this approach, however, is that it always produces a ranking list of all items, regardless of how many conflicts exist among users. This work rejects the forced agreement of all items. Instead, we define a new concept, maximum consensus sequences, which are the longest ranking lists of items that agree with the majority and disagree only with the minority. Based on this concept, algorithm MCS is developed to determine the maximum consensus sequences from users’ ranking data, and also to identify conflict items that need further negotiation. Extensive experiments are carried out using synthetic data sets, and the results indicate that the proposed method is computationally efficient. Finally, we discuss how the identified consensus sequences and conflict items information can be used in practice.  相似文献   

3.
《Optimization》2012,61(5):1177-1193
So far numerous models have been proposed for ranking the efficient decision-making units (DMUs) in data envelopment analysis (DEA). But, the most shortcoming of these models is their two-stage orientation. That is, firstly we have to find efficient DMUs and then rank them. Another flaw of some of these models, like AP-model (A procedure for ranking efficient units in data envelopment analysis, Management Science, 39 (10) (1993) 1261–1264), is existence of a non-Archimedean number in their objective function. Besides, when there is more than one weak efficient unit (or non-extreme efficient unit) these models could not rank DMUs. In this paper, we employ hyperplanes of the production possibility set (PPS) and propose a new method for complete ranking of DMUs in DEA. The proposed approach is a one stage method which ranks all DMUs (efficient and inefficient). In addition to ranking, the proposed method determines the type of efficiency for each DMU, simultaneously. Numerical examples are given to show applicability of the proposed method.  相似文献   

4.
In this paper, a new method for comparing fuzzy numbers based on a fuzzy probabilistic preference relation is introduced. The ranking order of fuzzy numbers with the weighted confidence level is derived from the pairwise comparison matrix based on 0.5-transitivity of the fuzzy probabilistic preference relation. The main difference between the proposed method and existing ones is that the comparison result between two fuzzy numbers is expressed as a fuzzy set instead of a crisp one. As such, the ranking order of n fuzzy numbers provides more information on the uncertainty level of the comparison. Illustrated by comparative examples, the proposed method overcomes certain unreasonable (due to the violation of the inequality properties) and indiscriminative problems exhibited by some existing methods. More importantly, the proposed method is able to provide decision makers with the probability of making errors when a crisp ranking order is obtained. The proposed method is also able to provide a probability-based explanation for conflicts among the comparison results provided by some existing methods using a proper ranking order, which ensures that ties of alternatives can be broken.  相似文献   

5.
Classification of items as good or bad can often be achieved more economically by examining the items in groups rather than individually. If the result of a group test is good, all items within it can be classified as good, whereas one or more items are bad in the opposite case. Whether it is necessary to identify the bad items or not, and if so, how, is described by the screening policy. In the course of time, a spectrum of group screening models has been studied, each including some policy. However, the majority ignores that items may arrive at random time epochs at the testing center in real life situations. This dynamic aspect leads to two decision variables: the minimum and maximum group size. In this paper, we analyze a discrete-time batch-service queueing model with a general dependency between the service time of a batch and the number of items within it. We deduce several important quantities, by which the decision variables can be optimized. In addition, we highlight that every possible screening policy can, in principle, be studied, by defining the dependency between the service time of a batch and the number of items within it appropriately.  相似文献   

6.
Group decision making is a type of decision problem in which multiple experts acting collectively, analyze problems, evaluate alternatives, and select a solution from a collection of alternatives. As the natural language is the standard representation of those concepts that humans use for communication, it seems natural that they use words (linguistic terms) instead of numerical values to provide their opinions. However, while linguistic information is readily available, it is not operational and thus it has to be made usable though expressing it in terms of information granules. To do so, Granular Computing, which has emerged as a unified and coherent framework of designing, processing, and interpretation of information granules, can be used. The aim of this paper is to present an information granulation of the linguistic information used in group decision making problems defined in heterogeneous contexts, i.e., where the experts have associated importance degrees reflecting their ability to handle the problem. The granulation of the linguistic terms is formulated as an optimization problem, solved by using the particle swarm optimization, in which a performance index is maximized by a suitable mapping of the linguistic terms on information granules formalized as sets. This performance index is expressed as a weighted aggregation of the individual consistency achieved by each expert.  相似文献   

7.
Discretization techniques can be used to reduce the number of values for a given continuous attribute, and a concept hierarchy can be used to define a discretization of a given continuous attribute. Traditional methods of building a concept hierarchy from a continuous attribute are usually based on the level-wise approach. Unfortunately, this approach suffers from three weaknesses: (1) it only seeks a local optimal solution instead of a global optimal, (2) it is usually subject to the constraint that each interval can only be partitioned into a fixed number of subintervals, and (3) the constructed tree may be unbalanced. In view of these weaknesses, this paper develops a new algorithm based on dynamic-programming strategy for constructing concept hierarchies from continuous attributes. The constructed trees have three merits: (1) they are global optimal trees, (2) each interval is partitioned into the most appropriate number of subintervals, and (3) the trees are balanced. Finally, we carry out an experimental study using real data to show its efficiency and effectiveness.  相似文献   

8.
With the broad development of the World Wide Web, various kinds of heterogeneous data (including multimedia data) are now available to decision support tasks. A data warehousing approach is often adopted to prepare data for relevant analysis. Data integration and dimensional modeling indeed allow the creation of appropriate analysis contexts. However, the existing data warehousing tools are well-suited to classical, numerical data. They cannot handle complex data. In our approach, we adapt the three main phases of the data warehousing process to complex data. In this paper, we particularly focus on two main steps in complex data warehousing. The first step is data integration. We define a generic UML model that helps representing a wide range of complex data, including their possible semantic properties. Complex data are then stored in XML documents generated by a piece of software we designed. The second important phase we address is the preparation of data for dimensional modeling. We propose an approach that exploits data mining techniques to assist users in building relevant dimensional models.  相似文献   

9.
We consider a group decision-making problem where preferences given by the experts are articulated into the form of pairwise comparison matrices. In many cases, experts are not able to efficiently provide their preferences on some aspects of the problem because of a large number of alternatives, limited expertise related to some problem domain, unavailable data, etc., resulting in incomplete pairwise comparison matrices. Our goal is to develop a computational method to retrieve a group priority vector of the considered alternatives dealing with incomplete information. For that purpose, we have established an optimization problem in which a similarity function and a parametric compromise function are defined. Associated to this problem, a logarithmic goal programming formulation is considered to provide an effective procedure to compute the solution. Moreover, the parameters involved in the method have a clear meaning in the context of group problems.  相似文献   

10.
Supervised classification is an important part of corporate data mining to support decision making in customer-centric planning tasks. The paper proposes a hierarchical reference model for support vector machine based classification within this discipline. The approach balances the conflicting goals of transparent yet accurate models and compares favourably to alternative classifiers in a large-scale empirical evaluation in real-world customer relationship management applications. Recent advances in support vector machine oriented research are incorporated to approach feature, instance and model selection in a unified framework.  相似文献   

11.
Denoising analysis imposes new challenge for mining high-frequency financial data due to its irregularities and roughness. Inefficient decomposition of the systematic pattern (the trend) and noises of high-frequency data will lead to erroneous conclusion as the irregularities and roughness of the data make the application of traditional methods difficult. In this paper, we propose the local linear scaling approximation (in short, LLSA) algorithm, a new nonlinear filtering algorithm based on the linear maximal overlap discrete wavelet transform (MODWT) to decompose the systematic pattern and noises. We show several unique properties of this brand-new algorithm, that are, the local linearity, computational complexity, and consistency. We conduct a simulation study to confirm these properties we have analytically shown and compare the performance of LLSA with MODWT. We then apply our new algorithm with the real high-frequency data from German equity market to investigate its implementation in forecasting. We show the superior performance of LLSA and conclude that it can be applied with flexible settings and suitable for high-frequency data mining.  相似文献   

12.
Data envelopment analysis (DEA) is a method to estimate the relative efficiency of decision-making units (DMUs) performing similar tasks in a production system that consumes multiple inputs to produce multiple outputs. So far, a number of DEA models with interval data have been developed. The CCR model with interval data, the BCC model with interval data and the FDH model with interval data are well known as basic DEA models with interval data. In this study, we suggest a model with interval data called interval generalized DEA (IGDEA) model, which can treat the stated basic DEA models with interval data in a unified way. In addition, by establishing the theoretical properties of the relationships among the IGDEA model and those DEA models with interval data, we prove that the IGDEA model makes it possible to calculate the efficiency of DMUs incorporating various preference structures of decision makers.  相似文献   

13.
The DEAHP method for weight deviation and aggregation in the analytic hierarchy process (AHP) has been found flawed and sometimes produces counterintuitive priority vectors for inconsistent pairwise comparison matrices, which makes its application very restrictive. This paper proposes a new data envelopment analysis (DEA) method for priority determination in the AHP and extends it to the group AHP situation. In this new DEA methodology, two specially constructed DEA models that differ from the DEAHP model are used to derive the best local priorities from a pairwise comparison matrix or a group of pairwise comparison matrices no matter whether they are perfectly consistent or inconsistent. The new DEA method produces true weights for perfectly consistent pairwise comparison matrices and the best local priorities that are logical and consistent with decision makers (DMs)’ subjective judgments for inconsistent pairwise comparison matrices. In hierarchical structures, the new DEA method utilizes the simple additive weighting (SAW) method for aggregation of the best local priorities without the need of normalization. Numerical examples are examined throughout the paper to show the advantages of the new DEA methodology and its potential applications in both the AHP and group decision making.  相似文献   

14.
In this paper, a new learning algorithm based on group method of data handling (GMDH) is proposed for the identification of Takagi-Sugeno fuzzy model. Different from existing methods, the new approach, called TS-GMDH, starts from simple elementary TS fuzzy models, and then uses the mechanism of GMDH to produce candidate fuzzy models of growing complexity until the TS model of optimal complexity has been created. The main characteristic of the new approach is its ability to identify the structure of TS model automatically. Experiments on Box-Jenkins gas furnace data and UCI datasets have shown that the proposed method can achieve satisfactory results and is more robust to noise in comparison with other TS modeling techniques such as ANFIS.  相似文献   

15.
This paper considers a construction project problem under multiple criteria in a fuzzy environment and proposes a new two-phase group decision making (GDM) approach. This approach integrates a modified analytic network process (ANP) and an improved compromise ranking method, known as VIKOR. To take uncertainty and risk into account, a new decision making approach is presented with multiple fuzzy information by a group of experts, and a risk attitude for each expert is incorporated that can be expressed linguistically. First, a modified fuzzy ANP method is introduced to address the problem of dependence as well as feedback among conflicting criteria and to determine their relative importance. Then, a fuzzy VIKOR method is extended to rank potential projects on the basis of their overall performance. An illustrative example from the literature is provided for the construction project problem to demonstrate the effectiveness and feasibility of the proposed approach. The computational results show that the proposed two-phase GDM approach is suitable to cope with imprecision and subjectivity for the complicated decision making problem. Finally, the associated results of the proposed approach with risk attitudes and without risk attitudes are compared with the results reported by Cheng and Li [1], and the merits are highlighted.  相似文献   

16.
In decision making problems, there may be the cases where the decision makers express their judgements by using preference relations with incomplete information. Then one of the key issues is how to estimate the missing preference values. In this paper, we introduce an incomplete interval multiplicative preference relation and give the definitions of consistent and acceptable incomplete ones, respectively. Based on the consistency property of interval multiplicative preference relations, a goal programming model is proposed to complement the acceptable incomplete one. A new algorithm of obtaining the priority vector from incomplete interval multiplicative preference relations is given. The goal programming model is further applied to group decision-making (GDM) where the experts evaluate their preferences as acceptable incomplete interval multiplicative preference relations. An interval weighted geometric averaging (IWGA) operator is proposed to aggregate individual preference relations into a social one. Furthermore, the social interval multiplicative preference relation owns acceptable consistency when every individual one is acceptably consistent. Two numerical examples are carried out to show the efficiency of the proposed goal programming model and the algorithms.  相似文献   

17.
针对传统区间数据包络分析方法,在确定每一个决策单元区间效率的上界和下界时,存在的评价尺度不一致且计算复杂等问题,本文提出了一种同时最大化所有决策单元的效率上界和下界的公共权重区间DEA模型,并给出了一种考虑决策者偏好信息的可能度排序方法,用以解决区间效率的全排序问题。最后,以中国大陆11个沿海省份工业生产效率测算为例说明了所提方法的有效性和实用性。  相似文献   

18.
Decision makers (DMs)’ preferences on decision alternatives are often characterized by multiplicative or fuzzy preference relations. This paper proposes a chi-square method (CSM) for obtaining a priority vector from multiplicative and fuzzy preference relations. The proposed CSM can be used to obtain a priority vector from either a multiplicative preference relation (i.e. a pairwise comparison matrix) or a fuzzy preference relation or a group of multiplicative preference relations or a group of fuzzy preference relations or their mixtures. Theorems and algorithm about the CSM are developed. Three numerical examples are examined to illustrate the applications of the CSM and its advantages.  相似文献   

19.
The field of direct marketing is constantly searching for new data mining techniques in order to analyze the increasing available amount of data. Self-organizing maps (SOM) have been widely applied and discussed in the literature, since they give the possibility to reduce the complexity of a high dimensional attribute space while providing a powerful visual exploration facility. Combined with clustering techniques and the extraction of the so-called salient dimensions, it is possible for a direct marketer to gain a high level insight about a dataset of prospects. In this paper, a SOM-based profile generator is presented, consisting of a generic method leading to value-adding and business-oriented profiles for targeting individuals with predefined characteristics. Moreover, the proposed method is applied in detail to a concrete case study from the concert industry. The performance of the method is then illustrated and discussed and possible future research tracks are outlined.  相似文献   

20.
Although support vector regression models are being used successfully in various applications, the size of the business datasets with millions of observations and thousands of variables makes training them difficult, if not impossible to solve. This paper introduces the Row and Column Selection Algorithm (ROCSA) to select a small but informative dataset for training support vector regression models with standard SVM tools. ROCSA uses ε-SVR models with L1-norm regularization of the dual and primal variables for the row and column selection steps, respectively. The first step involves parallel processing of data chunks and selects a fraction of the original observations that are either representative of the pattern identified in the chunk, or represent those observations that do not fit the identified pattern. The column selection step dramatically reduces the number of variables and the multicolinearity in the dataset, increasing the interpretability of the resulting models and their ease of maintenance. Evaluated on six retail datasets from two countries and a publicly available research dataset, the reduced ROCSA training data improves the predictive accuracy on average by 39% compared with the original dataset when trained with standard SVM tools. Comparison with the ε SSVR method using reduced kernel technique shows similar performance improvement. Training a standard SVM tool with the ROCSA selected observations improves the predictive accuracy on average by 21% compared to the practical approach of random sampling.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号