首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Learning from imbalanced data, where the number of observations in one class is significantly larger than the ones in the other class, has gained considerable attention in the machine learning community. Assuming the difficulty in predicting each class is similar, most standard classifiers will tend to predict the majority class well. This study applies tornado data that are highly imbalanced, as they are rare events. The severe weather data used herein have thunderstorm circulations (mesocyclones) that produce tornadoes in approximately 6.7 % of the total number of observations. However, since tornadoes are high impact weather events, it is important to predict the minority class with high accuracy. In this study, we apply support vector machines (SVMs) and logistic regression with and without a midpoint threshold adjustment on the probabilistic outputs, random forest, and rotation forest for tornado prediction. Feature selection with SVM-recursive feature elimination was also performed to identify the most important features or variables for predicting tornadoes. The results showed that the threshold adjustment on SVMs provided better performance compared to other classifiers.  相似文献   

2.
讨论了分数阶离散混沌系统驱动系统和相应系统都是相同混沌映射、但是参数不同时的同步问题,采用了参数自适应算法实现了分数阶离散logistic映射的同步,并且给出了同步的充分条件.  相似文献   

3.
Predicting phenotypes on the basis of gene expression profiles is a classification task that is becoming increasingly important in the field of precision medicine. Although these expression signals are real-valued, it is questionable if they can be analyzed on an interval scale. As with many biological signals their influence on e.g. protein levels is usually non-linear and thus can be misinterpreted. In this article we study gene expression profiles with up to 54,000 dimensions. We analyze these measurements on an ordinal scale by replacing the real-valued profiles by their ranks. This type of rank transformation can be used for the construction of invariant classifiers that are not affected by noise induced by data transformations which can occur in the measurement setup. Our 10 \(\times \) 10 fold cross-validation experiments on 86 different data sets and 19 different classification models indicate that classifiers largely benefit from this transformation. Especially random forests and support vector machines achieve improved classification results on a significant majority of datasets.  相似文献   

4.
Minimum average variance estimation (MAVE, Xia et al. (2002) [29]) is an effective dimension reduction method. It requires no strong probabilistic assumptions on the predictors, and can consistently estimate the central mean subspace. It is applicable to a wide range of models, including time series. However, the least squares criterion used in MAVE will lose its efficiency when the error is not normally distributed. In this article, we propose an adaptive MAVE which can be adaptive to different error distributions. We show that the proposed estimate has the same convergence rate as the original MAVE. An EM algorithm is proposed to implement the new adaptive MAVE. Using both simulation studies and a real data analysis, we demonstrate the superior finite sample performance of the proposed approach over the existing least squares based MAVE when the error distribution is non-normal and the comparable performance when the error is normal.  相似文献   

5.
Estimates independent of a priori information about the function from under estimation (adaptive estimates) are suggested. These estimates are applied to various problems of the regression estimation, the density estimation, and the spectral function estimation. Bibliography: 18 titles. Translated fromZapiski Nauchnykh Seminarov POMI, Vol. 244, 1997, pp. 28–45. Translated by A. Sudakov.  相似文献   

6.
Estimation of a quadratic functional of a function observed in the Gaussian white noise model is considered. A data-dependent method for choosing the amount of smoothing is given. The method is based on comparing certain quadratic estimators with each other. It is shown that the method is asymptotically sharp or nearly sharp adaptive simultaneously for the “regular” and “irregular” region. We consider lp bodies and construct bounds for the risk of the estimator which show that for p=4 the estimator is exactly optimal and for example when p ∈[3,100], then the upper bound is at most 1.055 times larger than the lower bound. We show the connection of the estimator to the theory of optimal recovery. The estimator is a calibration of an estimator which is nearly minimax optimal among quadratic estimators. Writing of this article was financed by Deutsche Forschungsgemeinschaft under project MA1026/6-2, CIES, France, and Jenny and AnttiWihuri Foundation.  相似文献   

7.
Cost-sensitive classification is based on a set of weights defining the expected cost of misclassifying an object. In this paper, a Genetic Fuzzy Classifier, which is able to extract fuzzy rules from interval or fuzzy valued data, is extended to this type of classification. This extension consists in enclosing the estimation of the expected misclassification risk of a classifier, when assessed on low quality data, in an interval or a fuzzy number. A cooperative-competitive genetic algorithm searches for the knowledge base whose fitness is primal with respect to a precedence relation between the values of this interval or fuzzy valued risk. In addition to this, the numerical estimation of this risk depends on the entrywise product of cost and confusion matrices. These have been, in turn, generalized to vague data. The flexible assignment of values to the cost function is also tackled, owing to the fact that the use of linguistic terms in the definition of the misclassification cost is allowed.  相似文献   

8.
A general approach to designing multiple classifiers represents them as a combination of several binary classifiers in order to enable correction of classification errors and increase reliability. This method is explained, for example, in Witten and Frank (Data Mining: Practical Machine Learning Tools and Techniques, 2005, Sect. 7.5). The aim of this paper is to investigate representations of this sort based on Brandt semigroups. We give a formula for the maximum number of errors of binary classifiers, which can be corrected by a multiple classifier of this type. Examples show that our formula does not carry over to larger classes of semigroups.  相似文献   

9.
The main objective of this paper is to compare the classification accuracy provided by large, comprehensive collections of patterns (rules) derived from archives of past observations, with that provided by small, comprehensible collections of patterns. This comparison is carried out here on the basis of an empirical study, using several publicly available data sets. The results of this study show that the use of comprehensive collections allows a slight increase of classification accuracy, and that the “cost of comprehensibility” is small.  相似文献   

10.
Summary  The Bayesian estimation on lifetime data under fuzzy environments is proposed in this paper. In order to apply the Bayesian approach, the fuzzy parameters are assumed as fuzzy random variables with fuzzy prior distributions. The (conventional) Bayesian estimation method will be used to create the fuzzy Bayes point estimator by invoking the well-known theorem called “Resolution Identity” in fuzzy set theory. On the other hand, we also provide computational procedures to evaluate the membership degree of any given Bayes point estimate. In order to achieve this purpose, we transform the original problem into a nonlinear programming problem. This nonlinear programming problem is then divided into four subproblems for the purpose of simplifying computation. Finally, the subproblems can be solved by using any commercial optimizers, e.g., GAMS or LINDO.  相似文献   

11.
Summary Locally asymptotically minimax (LAM) estimates are constructed for locally asymptotically normal (LAN) families under very mild additional assumptions. Adaptive estimation is also considered and a sufficient condition is given for an estimate to be locally asymptotically minimax adaptive. Incidently, it is shown that a well known lower bound due to Hájek (1972) for the local asymptotic minimax risk is not sharp.Research partially supported by NSF grants no. MCS 78-02846 and MCS 77-03493-01  相似文献   

12.
Advances in Data Analysis and Classification - We introduce the Robust Logistic Zero-Sum Regression (RobLZS) estimator, which can be used for a two-class problem with high-dimensional compositional...  相似文献   

13.
14.
Parallel to Cox's [JRSS B34 (1972) 187-230] proportional hazards model, generalized logistic models have been discussed by Anderson [Bull. Int. Statist. Inst. 48 (1979) 35-53] and others. The essential assumption is that the two densities ratio has a known parametric form. A nice property of this model is that it naturally relates to the logistic regression model for categorical data. In astronomic, demographic, epidemiological, and other studies the variable of interest is often truncated by an associated variable. This paper studies generalized logistic models for the two-sample truncated data problem, where the two lifetime densities ratio is assumed to have the form exp{α+φ(x;β)}. Here φ is a known function of x and β, and the baseline density is unspecified. We develop a semiparametric maximum likelihood method for the case where the two samples have a common truncation distribution. It is shown that inferences for β do not depend the nonparametric components. We also derive an iterative algorithm to maximize the semiparametric likelihood for the general case where different truncation distributions are allowed. We further discuss how to check goodness of fit of the generalized logistic model. The developed methods are illustrated and evaluated using both simulated and real data.  相似文献   

15.
The paper presents a method of adaptive estimation for a class of probability density functions. This method is a continual analog of some known methods. Bibiligraphy: 10 titles.  相似文献   

16.
In this paper, we address the problem of pointwise estimation in the Gaussian white noise model. We propose a new data-driven procedure that achieves (up to a multiplicative logarithmic term) the minimax rate of convergence over a scale of anisotropic Hölder spaces. Moreover we present a general criterion in order to define what should be an “optimal” estimation procedure and we prove that our procedure satisfies this criterion. The extra logarithmic term can thus be viewed as an unavoidable price to pay for adaptation.  相似文献   

17.
Geometric coordinates are an integral part of many data streams. Examples include sensor locations in environmental monitoring, vehicle locations in traffic monitoring or battlefield simulations, scientific measurements of earth or atmospheric phenomena, etc. This paper focuses on the problem of summarizing such geometric data streams using limited storage so that many natural geometric queries can be answered faithfully. Some examples of such queries are: report the smallest convex region in which a chemical leak has been sensed, or track the diameter of the dataset, or track the extent of the dataset in any given direction. One can also pose queries over multiple streams: for instance, track the minimum distance between the convex hulls of two data streams, report when datasets A and B are no longer linearly separable, or report when points of data stream A become completely surrounded by points of data stream B, etc. These queries are easily extended to more than two streams.

In this paper, we propose an adaptive sampling scheme that gives provably optimal error bounds for extremal problems of this nature. All our results follow from a single technique for computing the approximate convex hull of a point stream in a single pass. Our main result is this: given a stream of two-dimensional points and an integer r, we can maintain an adaptive sample of at most 2r+1 points such that the distance between the true convex hull and the convex hull of the sample points is O(D/r2), where D is the diameter of the sample set. The amortized time for processing each point in the stream is O(logr). Using the sample convex hull, all the queries mentioned above can be answered approximately in either O(logr) or O(r) time.  相似文献   


18.
19.
We consider nonparametric estimation of a smooth function of one variable. Global selection procedures cannot sufficiently account for local sparseness of the covariate nor can they adapt to local curvature of the regression function. We propose a new method for selecting local smoothing parameters which takes into account sparseness and adapts to local curvature. A Bayesian type argument provides an initial smoothing parameter which adapts to the local sparseness of the covariate and provides the basis for local bandwidth selection procedures which further adjust the bandwidth according to the local curvature of the regression function. Simulation evidence indicates that the proposed method can result in reduction of both pointwise mean squared error and integrated mean squared error.  相似文献   

20.
The endomorphism monoids of graphs have been actively investigated. They are convenient tools expressing asymmetries of the graphs. One of the most important classes of graphs considered in this framework is that of Cayley graphs. Our paper proposes a new method of using Cayley graphs for classification of data. We give a survey of recent results devoted to the Cayley graphs also involving their endomorphism monoids.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号