首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 0 毫秒
1.
Recently developed SAGE technology enables us to simultaneously quantify the expression levels of thousands of genes in a population of cells. SAGE data is helpful in classification of different types of cancers. However, one main challenge in this task is the availability of a smaller number of samples compared to huge number of genes, many of which are irrelevant for classification. Another main challenge is that there is a lack of appropriate statistical methods that consider the specific properties of SAGE data. We propose an efficient solution by selecting relevant genes by information gain and building a multinomial event model for SAGE data. Promising results, in terms of accuracy, were obtained for the model proposed.   相似文献   

2.
Data mining aims to find patterns in organizational databases. However, most techniques in mining do not consider knowledge of the quality of the database. In this work, we show how to incorporate into classification mining recent advances in the data quality field that view a database as the product of an imprecise manufacturing process where the flaws/defects are captured in quality matrices. We develop a general purpose method of incorporating data quality matrices into the data mining classification task. Our work differs from existing data preparation techniques since while other approaches detect and fix errors to ensure consistency with the entire data set our work makes use of the apriori knowledge of how the data is produced/manufactured.  相似文献   

3.
Data reduction is an important issue in the field of data mining. The goal of data reduction techniques is to extract a subset of data from a massive dataset while maintaining the properties and characteristics of the original data in the reduced set. This allows an otherwise difficult or impossible data mining task to be carried out efficiently and effectively. This paper describes a new method for selecting a subset of data that closely represents the original data in terms of its joint and univariate distributions. A pair of distance criteria, motivated by the χ2-statistic, are used for measuring the goodness-of-fit between the distributions of the reduced and full datasets. Under these criteria, the data reduction problem can be formulated as a bi-objective quadratic program. A genetic algorithm technique is used in the search/optimization process. Experiments conducted on several real-world data sets demonstrate the effectiveness of the proposed method.  相似文献   

4.
Denoising analysis imposes new challenge for mining high-frequency financial data due to its irregularities and roughness. Inefficient decomposition of the systematic pattern (the trend) and noises of high-frequency data will lead to erroneous conclusion as the irregularities and roughness of the data make the application of traditional methods difficult. In this paper, we propose the local linear scaling approximation (in short, LLSA) algorithm, a new nonlinear filtering algorithm based on the linear maximal overlap discrete wavelet transform (MODWT) to decompose the systematic pattern and noises. We show several unique properties of this brand-new algorithm, that are, the local linearity, computational complexity, and consistency. We conduct a simulation study to confirm these properties we have analytically shown and compare the performance of LLSA with MODWT. We then apply our new algorithm with the real high-frequency data from German equity market to investigate its implementation in forecasting. We show the superior performance of LLSA and conclude that it can be applied with flexible settings and suitable for high-frequency data mining.  相似文献   

5.
Several researchers demonstrated that spectral parameters in induced polarization can be applied to discriminate different IP sources. In this paper it was applied an inversion procedure using the Gauss–Newton method to recover the spectral parameters of fractal model to complex resistivity. The finite element method was applied to carry out the forward modeling. The procedure was applied in synthetic data and simulations were carried out in five different frequencies. The inversion of the data were carried out in each frequency, further the inversion was applied also to each cell of the finite element mesh to recover the fractal parameter in order to analyze the possibility of using the fractal model parameters in the interpretation of the induced polarization response to this geological geometry. The results showed that the anomalies are well detected by the image of the fractal model parameters.  相似文献   

6.
Sets of “positive” and “negative” points (observations) in n-dimensional discrete space given along with their non-negative integer multiplicities are analyzed from the perspective of the Logical Analysis of Data (LAD). A set of observations satisfying upper and/or lower bounds imposed on certain components is called a positive pattern if it contains some positive observations and no negative one. The number of variables on which such restrictions are imposed is called the degree of the pattern. A total polynomial algorithm is proposed for the enumeration of all patterns of limited degree, and special efficient variants of it for the enumeration of all patterns with certain “sign” and “coverage” requirements are presented and evaluated on a publicly available collection of benchmark datasets.  相似文献   

7.
The identification of different dynamics in sequential data has become an every day need in scientific fields such as marketing, bioinformatics, finance, or social sciences. Contrary to cross-sectional or static data, this type of observations (also known as stream data, temporal data, longitudinal data or repeated measures) are more challenging as one has to incorporate data dependency in the clustering process. In this research we focus on clustering categorical sequences. The method proposed here combines model-based and heuristic clustering. In the first step, the categorical sequences are transformed by an extension of the hidden Markov model into a probabilistic space, where a symmetric Kullback–Leibler distance can operate. Then, in the second step, using hierarchical clustering on the matrix of distances, the sequences can be clustered. This paper illustrates the enormous potential of this type of hybrid approach using a synthetic data set as well as the well-known Microsoft dataset with website users search patterns and a survey on job career dynamics.  相似文献   

8.
We use four orthogonal polynomial series, Legendre, Chebyshev, Hermite and Laguerre series, to approximate the non-homogeneous term for the precise time integration and incorporate them with the dimensional expanding technique. They are applied to various structures subjected to transient dynamic loading together with Fourier and Taylor approximation proposed in previous works. Numerical examples show that all six methods are efficient and have reasonable precision. In particular, Legendre approximation has much higher precision and better convergence; Chebyshev approximation is also good, but only slightly inferior to Legendre approximation. The other four approximation methods usually produce results with errors hundreds of thousands of times larger. Hermite and Laguerre approximation may be useful for some special non-homogeneous terms, but do not work sufficiently well in our numerical examples. Other contributions of this paper include, a Dynamic Programming scheme for computing series coefficients, a general formula to find the assistant matrix for any polynomial series.  相似文献   

9.
Nonnegative matrix factorization for spectral data analysis   总被引:1,自引:0,他引:1  
Data analysis is pervasive throughout business, engineering and science. Very often the data to be analyzed is nonnegative, and it is often preferable to take this constraint into account in the analysis process. Here we are concerned with the application of analyzing data obtained using astronomical spectrometers, which provide spectral data, which is inherently nonnegative. The identification and classification of space objects that cannot be imaged in the normal way with telescopes is an important but difficult problem for tracking thousands of objects, including satellites, rocket bodies, debris, and asteroids, in orbit around the earth. In this paper, we develop an effective nonnegative matrix factorization algorithm with novel smoothness constraints for unmixing spectral reflectance data for space object identification and classification purposes. Promising numerical results are presented using laboratory and simulated datasets.  相似文献   

10.
In this survey paper, we present advances achieved during the last years in the development and use of OR, in particular, optimization methods in the new gene-environment and eco-finance networks, based on usually finite data series, with an emphasis on uncertainty in them and in the interactions of the model items. Indeed, our networks represent models in the form of time-continuous and time-discrete dynamics, whose unknown parameters we estimate under constraints on complexity and regularization by various kinds of optimization techniques, ranging from linear, mixed-integer, spline, semi-infinite and robust optimization to conic, e.g., semi-definite programming. We present different kinds of uncertainties and a new time-discretization technique, address aspects of data preprocessing and of stability, related aspects from game theory and financial mathematics, we work out structural frontiers and discuss chances for future research and OR application in our real world.  相似文献   

11.
A general approach to designing multiple classifiers represents them as a combination of several binary classifiers in order to enable correction of classification errors and increase reliability. This method is explained, for example, in Witten and Frank (Data Mining: Practical Machine Learning Tools and Techniques, 2005, Sect. 7.5). The aim of this paper is to investigate representations of this sort based on Brandt semigroups. We give a formula for the maximum number of errors of binary classifiers, which can be corrected by a multiple classifier of this type. Examples show that our formula does not carry over to larger classes of semigroups.  相似文献   

12.
Rough set theory provides a powerful tool for dealing with uncertainty in data. Application of variety of rough set models to mining data stored in a single table has been widely studied. However, analysis of data stored in a relational structure using rough sets is still an extensive research area. This paper proposes compound approximation spaces and their constrained versions that are intended for handling uncertainty in relational data. The proposed spaces are expansions of tolerance approximation ones to a relational case. Compared with compound approximation spaces, the constrained version enables to derive new knowledge from relational data. The proposed approach can improve mining relational data that is uncertain, incomplete, or inconsistent.  相似文献   

13.
The paper is concerned with the problem of binary classification of data records, given an already classified training set of records. Among the various approaches to the problem, the methodology of the logical analysis of data (LAD) is considered. Such approach is based on discrete mathematics, with special emphasis on Boolean functions. With respect to the standard LAD procedure, enhancements based on probability considerations are presented. In particular, the problem of the selection of the optimal support set is formulated as a weighted set covering problem. Testable statistical hypothesis are used. Accuracy of the modified LAD procedure is compared to that of the standard LAD procedure on datasets of the UCI repository. Encouraging results are obtained and discussed.  相似文献   

14.
Data envelopment analysis (DEA) is a method to estimate the relative efficiency of decision-making units (DMUs) performing similar tasks in a production system that consumes multiple inputs to produce multiple outputs. So far, a number of DEA models with interval data have been developed. The CCR model with interval data, the BCC model with interval data and the FDH model with interval data are well known as basic DEA models with interval data. In this study, we suggest a model with interval data called interval generalized DEA (IGDEA) model, which can treat the stated basic DEA models with interval data in a unified way. In addition, by establishing the theoretical properties of the relationships among the IGDEA model and those DEA models with interval data, we prove that the IGDEA model makes it possible to calculate the efficiency of DMUs incorporating various preference structures of decision makers.  相似文献   

15.
A multiresolutional approach for measurement decomposition and system modeling is presented in this paper. The decomposition is performed in both spatial and time domains and provides an excellent platform for developing computationally efficient algorithms. Using multiresolutional decomposition and modeling, a multiresolutional joint probabilistic data association (MR-JPDA) algorithm is developed for multiple target tracking. Monte Carlo simulations demonstrate that the computation of the MRJPDA algorithm is much less than the traditional joint probabilistic data association (JPDA) algorithm with a comparable performance.  相似文献   

16.
Dimensionality reduction is used to preserve significant properties of data in a low-dimensional space. In particular, data representation in a lower dimension is needed in applications, where information comes from multiple high dimensional sources. Data integration, however, is a challenge in itself.In this contribution, we consider a general framework to perform dimensionality reduction taking into account that data are heterogeneous. We propose a novel approach, called Deep Kernel Dimensionality Reduction which is designed for learning layers of new compact data representations simultaneously. The method can be also used to learn shared representations between modalities. We show by experiments on standard and on real large-scale biomedical data sets that the proposed method embeds data in a new compact meaningful representation, and leads to a lower classification error compared to the state-of-the-art methods.  相似文献   

17.
18.
Cross efficiency evaluation has long been proposed as an alternative method for ranking the decision making units (DMUs) in data envelopment analysis (DEA). This study proposes goal programming models that could be used in the second stage of the cross evaluation. Proposed goal programming models have different efficiency concepts as classical DEA, minmax and minsum efficiency criteria. Numerical examples are provided to illustrate the applications of the proposed goal programming cross efficiency models.  相似文献   

19.
Computational social science in general, and social agent-based modeling (ABM) simulation in particular, are challenged by modeling and analyzing complex adaptive social systems with emergent properties that are hard to understand in terms of components, even when the organization of component agents is know. Evolutionary computation (EC) is a mature field that provides a bio-inspired approach and a suite of techniques that are applicable to and provide new insights on complex adaptive social systems. This paper demonstrates a combined EC-ABM approach illustrated through the RebeLand model of a simple but complete polity system. Results highlight tax rates and frequency of public issue that stress society as significant features in phase transitions between stable and unstable governance regimes. These initial results suggest further applications of EC to ABM in terms of multi-population models with heterogeneous agents, multi-objective optimization, dynamic environments, and evolving executable objects for modeling social change.  相似文献   

20.
We study a special case of the critical point (Morse) theory of distance functions namely, the gradient flow associated with the distance function to a finite point set in . The fixed points of this flow are exactly the critical points of the distance function. Our main result is a mathematical characterization and algorithms to compute the stable manifolds, i.e., the inflow regions, of the fixed points. It turns out that the stable manifolds form a polyhedral complex that shares many properties with the Delaunay triangulation of the same point set. We call the latter complex the flow complex of the point set. The flow complex is suited for geometric modeling tasks like surface reconstruction.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号