首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
The new generation of mass spectrometers produces an astonishing amount of high-quality data in a brief period of time, leading to inevitable data analysis bottlenecks. Automated data analysis algorithms are required for rapid and repeatable processing of mass spectra containing hundreds of peaks, the part of the spectra containing information. New data processing algorithms must work with minimal user input, both to save operator time and to eliminate inevitable operator bias. Toward this end an accurate mathematical algorithm is presented that automatically locates and calculates the area beneath peaks. The promising numerical performance of this algorithm applied to raw data is presented.  相似文献   

2.
A prediction model for methane production in a wastewater processing facility is presented. The model is built by data-mining algorithms based on industrial data collected on a daily basis. Because of many parameters available in this research, a subset of parameters is selected using importance analysis. Prediction results of methane production are presented in this paper. The model performance by different algorithms is measured with five metrics. Based on these metrics, a model built by the Adaptive Neuro-Fuzzy Inference System algorithm has provided most accurate predictions of methane production.  相似文献   

3.
Gaussians are important tools for learning from data of large dimensions. The variance of a Gaussian kernel is a measurement of the frequency range of function components or features retrieved by learning algorithms induced by the Gaussian. The learning ability and approximation power increase when the variance of the Gaussian decreases. Thus, it is natural to use  Gaussians with decreasing variances  for online algorithms when samples are imposed one by one. In this paper, we consider fully online classification algorithms associated with a general loss function and varying Gaussians which are closely related to regularization schemes in reproducing kernel Hilbert spaces. Learning rates are derived in terms of the smoothness of a target function associated with the probability measure controlling sampling and the loss function. A critical estimate is given for the norm of the difference of regularized target functions as the variance of the Gaussian changes. Concrete learning rates are presented for the online learning algorithm with the least square loss function.  相似文献   

4.
A single machine scheduling problem is studied. There is a partition of the set of n jobs into g groups on the basis of group technology. Jobs of the same group are processed contiguously. A sequence independent setup time precedes the processing of each group. Two external renewable resources can be used to linearly compress setup and job processing times. The setup times are jointly compressible by one resource, the job processing times are jointly compressible by another resource and the level of the resource is the same for all setups and all jobs. Polynomial time algorithms are presented to find an optimal job sequence and resource values such that the total weighted resource consumption is minimum, subject to meeting job deadlines. The algorithms are based on solving linear programming problems with two variables by geometric techniques.  相似文献   

5.
A general methodology for selecting predictors for Gaussian generative classification models is presented. The problem is regarded as a model selection problem. Three different roles for each possible predictor are considered: a variable can be a relevant classification predictor or not, and the irrelevant classification variables can be linearly dependent on a part of the relevant predictors or independent variables. This variable selection model was inspired by a previous work on variable selection in model-based clustering. A BIC-like model selection criterion is proposed. It is optimized through two embedded forward stepwise variable selection algorithms for classification and linear regression. The model identifiability and the consistency of the variable selection criterion are proved. Numerical experiments on simulated and real data sets illustrate the interest of this variable selection methodology. In particular, it is shown that this well ground variable selection model can be of great interest to improve the classification performance of the quadratic discriminant analysis in a high dimension context.  相似文献   

6.
A method to investigate systems showing Type-I intermittency phenomenon is presented. This method is an extension of the procedure we have recently established to study the Type-II and Type-III intermittencies. With this approach, new accurate analytical expressions for the reinjection and the laminar phase length probability densities are obtained. The new theoretical formulas are tested by numerical computation, showing an excellent agreement between analytical models and numerical results. In addition, our method fully generalizes the well-known classical characteristic relations, in such a way that it properly characterizes those systems showing Type-I intermittency.  相似文献   

7.
The purpose of this paper is a design oriented survey of heuristics. Since the main application fields of heuristics are problems of the combinatorial type, an introductory synopsis of combinatorial problems is first presented (Section 2). Heuristics are a specific kind of algorithms; therefore, the position of heuristics within the system of algorithms is described (Section 3). The design of heuristics requires decisions, and decisions are choices among alternatives which have to be explicitly available; a basis for this is presented in a morphological classification of heuristics (Section 4). Based on the classification, some aspects of the design process will be considered (Section 5).  相似文献   

8.
An approach to dealing with missing data, both during the design and normal operation of a neuro-fuzzy classifier is presented in this paper. Missing values are processed within a general fuzzy min–max neural network architecture utilising hyperbox fuzzy sets as input data cluster prototypes. An emphasis is put on ways of quantifying the uncertainty which missing data might have caused. This takes a form of classification procedure whose primary objective is the reduction of a number of viable alternatives rather than attempting to produce one winning class without supporting evidence. If required, the ways of selecting the most probable class among the viable alternatives found during the primary classification step, which are based on utilising the data frequency information, are also proposed. The reliability of the classification and the completeness of information is communicated by producing upper and lower classification membership values similar in essence to plausibility and belief measures to be found in the theory of evidence or possibility and necessity values to be found in the fuzzy sets theory. Similarities and differences between the proposed method and various fuzzy, neuro-fuzzy and probabilistic algorithms are also discussed. A number of simulation results for well-known data sets are provided in order to illustrate the properties and performance of the proposed approach.  相似文献   

9.
Nowadays, the diffusion of smartphones, tablet computers, and other multipurpose equipment with high-speed Internet access makes new data types available for data analysis and classification in marketing. So, e.g., it is now possible to collect images/snaps, music, or videos instead of ratings. With appropriate algorithms and software at hand, a marketing researcher could simply group or classify respondents according to the content of uploaded images/snaps, music, or videos. However, appropriate algorithms and software are sparsely known in marketing research up to now. The paper tries to close this gap. Algorithms and software from computer science are presented, adapted and applied to data analysis and classification in marketing. The new SPSS-like software package IMADAC is introduced.  相似文献   

10.
A knowledge-based linear Tihkonov regularization classification model for tornado discrimination is presented. Twenty-three attributes, based on the National Severe Storms Laboratory’s Mesoscale Detection Algorithm, are used as prior knowledge. Threshold values for these attributes are employed to discriminate the data into two classes (tornado, non-tornado). The Weather Surveillance Radar 1998 Doppler is used as a source of data streaming every 6 min. The combination of data and prior knowledge is used in the development of a least squares problem that can be solved using matrix or iterative methods. Advantages of this formulation include explicit expressions for the classification weights of the classifier and its ability to incorporate and handle prior knowledge directly to the classifiers. Comparison of the present approach to that of Fung et al. [in Proceedings neural information processing systems (NIPS 2002), Vancouver, BC, December 10–12, 2002], over a suite of forecast evaluation indices, demonstrates that the Tikhonov regularization model is superior for discriminating tornadic from non-tornadic storms.  相似文献   

11.
A Taxonomy of Evolutionary Algorithms in Combinatorial Optimization   总被引:1,自引:0,他引:1  
This paper shows how evolutionary algorithms can be described in a concise, yet comprehensive and accurate way. A classification scheme is introduced and presented in a tabular form called TEA (Table of Evolutionary Algorithms). It distinguishes between different classes of evolutionary algorithms (e.g., genetic algorithms, ant systems) by enumerating the fundamental ingredients of each of these algorithms. At the end, possible uses of the TEA are illustrated on classical evolutionary algorithms.  相似文献   

12.
We propose the usage of Möbius transformations, defined in the context of Clifford algebras, for geometrically manipulating a point cloud data lying in a vector space of arbitrary dimension. We present this method as an application to signal classification in a dimensionality reduction framework. We first discuss a general situation where data analysis problems arise in signal processing. In this context, we introduce the construction of special Möbius transformations on vector spaces \({\mathbb{R}^n}\), customized for a classification setting. A computational experiment is presented indicating the potential and shortcomings of this framework.  相似文献   

13.
A unified presentation of classical clustering algorithms is proposed both for the hard and fuzzy pattern classification problems. Based on two types of objective functions, a new method is presented and compared with the procedures of Dunn and Ruspini. In order to determine the best, or more natural number of fuzzy clusters, two coefficients that measure the “degree of non-fuzziness” of the partition are proposed. Numerous computational results are shown.  相似文献   

14.
In credit scoring, low-default portfolios (LDPs) are those for which very little default history exists. This makes it problematic for financial institutions to estimate a reliable probability of a customer defaulting on a loan. Banking regulation (Basel II Capital Accord), and best practice, however, necessitate an accurate and valid estimate of the probability of default. In this article the suitability of semi-supervised one-class classification (OCC) algorithms as a solution to the LDP problem is evaluated. The performance of OCC algorithms is compared with the performance of supervised two-class classification algorithms. This study also investigates the suitability of over sampling, which is a common approach to dealing with LDPs. Assessment of the performance of one- and two-class classification algorithms using nine real-world banking data sets, which have been modified to replicate LDPs, is provided. Our results demonstrate that only in the near or complete absence of defaulters should semi-supervised OCC algorithms be used instead of supervised two-class classification algorithms. Furthermore, we demonstrate for data sets whose class labels are unevenly distributed that optimising the threshold value on classifier output yields, in many cases, an improvement in classification performance. Finally, our results suggest that oversampling produces no overall improvement to the best performing two-class classification algorithms.  相似文献   

15.
In this paper, we study the performance of various state-of-the-art classification algorithms applied to eight real-life credit scoring data sets. Some of the data sets originate from major Benelux and UK financial institutions. Different types of classifiers are evaluated and compared. Besides the well-known classification algorithms (eg logistic regression, discriminant analysis, k-nearest neighbour, neural networks and decision trees), this study also investigates the suitability and performance of some recently proposed, advanced kernel-based classification algorithms such as support vector machines and least-squares support vector machines (LS-SVMs). The performance is assessed using the classification accuracy and the area under the receiver operating characteristic curve. Statistically significant performance differences are identified using the appropriate test statistics. It is found that both the LS-SVM and neural network classifiers yield a very good performance, but also simple classifiers such as logistic regression and linear discriminant analysis perform very well for credit scoring.  相似文献   

16.
The problem of the state estimation of partially unknown, linear systems with non-Gaussian initial conditions in a multisensor environment is addressed in this paper. Two distributed algorithms are presented which can locally process the data collected by different local sensor subsystems. The local estimates are forwarded to a central processing center where the overall optimal estimate is obtained. The proposed algorithms are computationally attractive as well as theoretically interesting  相似文献   

17.
Estimation of dependence of a scalar variable on the vector of independent variables based on a training sample is considered. No a priori conditions are imposed on the form of the function. An approach to the estimation of the functional dependence is proposed based on the solution of a finite number of special classification problems constructed on the basis of the training sample and on the subsequent prediction of the value of the function as a group decision. A statistical model and Bayes’ formula are used to combine the recognition results. A generic algorithm for constructing the regression is proposed for different approaches to the selection of the committee of classification algorithms and to the estimation of their probabilistic characteristics. Comparison results of the proposed approach with the results obtained using other models for the estimation of dependences are presented.  相似文献   

18.
In the Knowledge Discovery Process, classification algorithms are often used to help create models with training data that can be used to predict the classes of untested data instances. While there are several factors involved with classification algorithms that can influence classification results, such as the node splitting measures used in making decision trees, feature selection is often used as a pre-classification step when using large data sets to help eliminate irrelevant or redundant attributes in order to increase computational efficiency and possibly to increase classification accuracy. One important factor common to both feature selection as well as to classification using decision trees is attribute discretization, which is the process of dividing attribute values into a smaller number of discrete values. In this paper, we will present and explore a new hybrid approach, ChiBlur, which involves the use of concepts from both the blurring and χ2-based approaches to feature selection, as well as concepts from multi-objective optimization. We will compare this new algorithm with algorithms based on the blurring and χ2-based approaches.  相似文献   

19.
A Structured Family of Clustering and Tree Construction Methods   总被引:1,自引:0,他引:1  
A cluster A is an Apresjan cluster if every pair of objects within A is more similar than either is to any object outside A. The criterion is intuitive, compelling, but often too restrictive for applications in classification. We therefore explore extensions of Apresjan clustering to a family of related hierarchical clustering methods. The extensions are shown to be closely connected with the well-known single and average linkage tree constructions. A dual family of methods for classification by splits is also presented. Splits are partitions of the set of objects into two disjoint blocks and are widely used in domains such as phylogenetics. Both the cluster and split methods give rise to progressively refined tree representations. We exploit dualities and connections between the various methods, giving polynomial time construction algorithms for most of the constructions and NP-hardness results for the rest.  相似文献   

20.
A technique for assessing the sensitivity of efficiency classifications in Data Envelopment Analysis (DEA) is presented. It extends the technique proposed by Charnes et al. (A. Charnes, J.J. Rousseau, J.H. Semple, Journal of Productivity Analysis 7 (1996) 5–18). An organization's input–output vector serves as the center for a cell within which the organization's classification remains unchanged under perturbations of the data. The maximal radius among such cells can be interpreted as a stability measure of the classification. Our approach adopts the inner-product norm for the radius, while the previous work does the polyhedral norms. For an efficient organization, the maximal-radius problem is a convex program. On the other hand, for an inefficient organization, it is reduced to a nonconvex program whose feasible region is the complement of a convex polyhedral set. We show that the latter nonconvex problem can be transformed into a linear reverse convex program. Our formulations and algorithms are valid not only in the CCR model but in its variants.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号