首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
The plug-in rule is used for the classification of random observations into one of two regular one-parametric distributions. The maximum likelihood estimates of unknown parameters obtained from the stratified training sample are used. The second-order asymptotic expansion in terms of the inverses of the training sample sizes is derived for the expected regret risk. The closed-form expressions of the expansion coefficients are applicable for the performance evaluation of the proposed classification rule. Klaipėda University, H. Manto 84, 5808 Klaipėda; Institute of Mathematics and Informatics, Akademijos 4, 2600 Vilnius, Lithuania. Published in Lietuvos Matematikos Rinkinys, Vol. 39, No. 2, pp. 220–230, April–June, 1999.  相似文献   

2.
We consider the problem of supervised classifying the multivariate Gaussian random field (GRF) single observation into one of two populations in case of given training sample. The populations are specified by different regression mean models and by common factorized covariance function. For completely specified populations, we derive a formula for Bayes error rate. In the case of unknown regression parameters and feature covariance matrix, the plug-in Bayes discriminant function based on ML estimators of parameters is used for classification. We derive the actual error rate and the asymptotic expansion of the expected error rate associated with plug-in Bayes discriminant function. These results are multivariate generalizations of previous ones. Numerical analysis of the derived formulas is implemented for the bivariate GRF observations at locations belonging to the two-dimensional lattice with unit spacing.  相似文献   

3.
We describe a formal approach to constructing the optimal classification rule for classification analysis with unknown prior probabilities ofKmultivariate normal populations membership. This is done by suggesting a balanced design for the classification experiment and by constructing the optimal rule under the balanced design condition. The rule is characterized by a constrained minimization of total risk of misclassification; the constraint of the rule is constructed by a process of equalization among expected utilities ofKpopulation conditional densities. The efficacy of the suggested rule is examined through numerical studies. This indicates that dramatic gains in the accuracy of classification result can be achieved in the case where little is known about the relative population sizes.  相似文献   

4.
In normal classification analysis, there may be cases where the population distributions are perturbed by a screening scheme. This paper considers a new classification method for screened data that is obtained from the perturbed normal distributions. Properties of each population distribution is considered and the best region for classifying the screened data is obtained. These developments yield yet another optimal rule for the classification. The rule is studied from several aspects such as a linear approximation, error rates, and estimation of the rule using the EM algorithm. Relationships among these aspects as well as investigation of the rule’s performance are also considered. The screened classification ideas are illustrated in detail using numerical examples.  相似文献   

5.
In this paper we are interested in studying multiple decision procedures fork (k≧2) populations which are themselves unknown but which one assumed to belong to a restricted family. We propose to study a selection procedure for distributions associated with these populations which are convex-ordered with respect to a specified distributionG assuming that there exists a best one. The procedure described here is based on a statistic which is a linear function of the firstr order statistics and which reduces to the total life statistics whenG is exponential. The infimum of the probability of a correct selection and an asymptotic expression for this probability are obtained using the subset selection approach. Some other properties of this procedure are discussed. Asymptotic relative efficiencies of this rule with respect to some selection procedures proposed by Barlow and Gupta [3] for the star-ordered distributions and by Gupta [8] for the gamma populations with known shape parameters are obtained. A selection procedure for selecting the best population using the indifference zone approach is also studied. This research was supported by the Office of Naval Research Contract N00014-75-C-0455 at Purdue University. Reproduction in whole or in part is permitted for any purpose of the United States Government. Ming-Wei Lu is now at the Department of Vital and Health Statistics, Michigan.  相似文献   

6.
An objective Bayesian model selection procedure is proposed for the one way analysis of variance under homoscedasticity. Bayes factors for the usual default prior distributions are not well defined and thus Bayes factors for intrinsic priors are used instead. The intrinsic priors depend on a training sample which is typically a unique random vector. However, for the homoscedastic ANOVA it is not the case. Nevertheless, we are able to illustrate that the Bayes factors for the intrinsic priors are not sensitive to the minimal training sample chosen; furthermore, we propose an alternative pooled prior that yields similar Bayes factors. To compute these Bayes factors Bayesian computing methods are required when the sample sizes of the involved populations are large. Finally, a one to one relationship—which we call the calibration curve—between the posterior probability of the null hypothesis and the classical $p$ value is found, thus allowing comparisons between these two measures of evidence. The behavior of the calibration curve as a function of the sample size is studied and conclusions relating both procedures are stated.  相似文献   

7.
In this paper, we first give an overview of the precedence-type test procedures. Then we propose a nonparametric test based on early failures for the equality of two life-time distributions against two alternatives concerning the best population. This procedure utilizes the minimal Wilcoxon rank-sum precedence statistic (Ng and Balakrishnan, 2002, 2004) which can determine the difference between populations based on early (100q%) failures. Hence, this procedure can be useful in life-testing experiments in biological as well as industrial settings. After proposing the test procedure, we derive the exact null distribution of the test statistic in the two-sample case with equal or unequal sample sizes. We also present the exact probability of correct selection under the Lehmann alternative. Then, we generalize the test procedure to the k-sample situation. Critical values for some sample sizes are presented. Next, we examine the performance of this test procedure under a location-shift alternative through Monte Carlo simulations. Two examples are presented to illustrate our test procedure with selecting the best population as an objective.   相似文献   

8.
Classification tests are constructed and their asymptotic equivalence with an optimal Bayes decision rule is shown. A sharper estimate of the upper bound of the probabilistic error classification is found.Translated from Staticheskie Metody, pp. 3–11, 1978.  相似文献   

9.
This paper is devoted to the asymptotic distribution of estimators for the posterior probability that a p-dimensional observation vector originates from one of k normal distributions with identical covariance matrices. The estimators are based on training samples for the k distributions involved. Observation vector and prior probabilities are regarded as given constants. The validity of various estimators and approximate confidence intervals is investigated by simulation experiments.  相似文献   

10.
Let X be a random variable taking values in a function space , and let Y be a discrete random label with values 0 and 1. We investigate asymptotic properties of the moving window classification rule based on independent copies of the pair (X,Y). Contrary to the finite dimensional case, it is shown that the moving window classifier is not universally consistent in the sense that its probability of error may not converge to the Bayes risk for some distributions of (X,Y). Sufficient conditions both on the space and the distribution of X are then given to ensure consistency.  相似文献   

11.
Summary Uniform (or type (B) d ) asymptotic normality of the joint distribution of an increasing number of sample quantiles as the sample size increases is investigated in both cases where the basic distributions are equal and are unequal. Under fairly general assumptions, sufficient conditions are derived for the asymptotic normality of sample quantiles. Type (B) d asymptotic normality is a strictly stronger notion than the usual one which is based on the convergence in law, and the results obtained in this article will be helpful to widen the applicability of results on asymptotic normality of sample quantiles to related statistical inferences.  相似文献   

12.
The full-information best choice problem with a random number of observations is considered. N i.i.d. random variables with a known continuous distribution are observed sequentially with the object of selecting the largest. Neither recall nor uncertainty of selection is allowed and one choice must be made. In this paper the number N of observations is random with a known distribution. The structure of the stopping set is investigated. A class of distributions of N (which contains in particular the uniform, negative-binomial and Poisson distributions) is determined, for which the so-called “monotone case” occurs. The theoretical solution for the monotone case is considered. In the case where N is geometric the optimal solution is presented and the probability of winning worked out. Finally, the case where N is uniform is examined. A simple asymptotically optimal stopping rule is found and the asymptotic probability of winning is obtained.  相似文献   

13.
Classification between two populations dealing with both continuous and binary variables is handled by splitting the problem into different locations. Given the location specified by the values of the binary variables, discrimination is performed using the continuous variables. The location probability model with homoscedastic across location conditional dispersion matrices is adopted for this problem. In this paper, we consider presence of continuous covariates with heterogeneous location conditional dispersion matrices. The continuous covariates have equal location specific mean in both populations. Conditional homoscedasticity fails when strong interaction between the continuous and binary variables is present. A plug-in covariance adjusted rule is constructed and its asymptotic distribution is derived. An asymptotic expansion for the overall error rate is given. The result is extended to include binary covariates.  相似文献   

14.
A robustified residual autocorrelation is defined based onL 1-regression. Under very general conditions, the asymptotic distribution of the robust residual autocorrelation is obtained. A robustified portmanteau statistic is then constructed which can be used in checking the goodness-of-fit of AR(p) models when usingL 1-norm fitting. Empirical results show thatL 1-norm estimators and the proposed portmanteau statistic are robust against outliers, error distributions, and accuracy for a given finite sample. Project supported by the Foundation of State Educational Commission and a research grant from the Doctoral Program Foundation of China (#97000139).  相似文献   

15.
Estimating financial risk is a critical issue for banks and insurance companies. Recently, quantile estimation based on extreme value theory (EVT) has found a successful domain of application in such a context, outperforming other methods. Given a parametric model provided by EVT, a natural approach is maximum likelihood estimation. Although the resulting estimator is asymptotically efficient, often the number of observations available to estimate the parameters of the EVT models is too small to make the large sample property trustworthy. In this paper, we study a new estimator of the parameters, the maximum Lq-likelihood estimator (MLqE), introduced by Ferrari and Yang (Estimation of tail probability via the maximum Lq-likelihood method, Technical Report 659, School of Statistics, University of Minnesota, 2007 ). We show that the MLqE outperforms the standard MLE, when estimating tail probabilities and quantiles of the generalized extreme value (GEV) and the generalized Pareto (GP) distributions. First, we assess the relative efficiency between the MLqE and the MLE for various sample sizes, using Monte Carlo simulations. Second, we analyze the performance of the MLqE for extreme quantile estimation using real-world financial data. The MLqE is characterized by a distortion parameter q and extends the traditional log-likelihood maximization procedure. When q→1, the new estimator approaches the traditional maximum likelihood estimator (MLE), recovering its desirable asymptotic properties; when q ≠ 1 and the sample size is moderate or small, the MLqE successfully trades bias for variance, resulting in an overall gain in terms of accuracy (mean squared error).   相似文献   

16.
We compute the asymptotic probability that two randomly selected compositions of n into parts equal to a or b have the same number of parts. In addition, we provide bijections in the case of parts of sizes 1 and 2 with weighted lattice paths and central Whitney numbers of fence posets. Explicit algebraic generating functions and asymptotic probabilities are also computed in the case of pairs of compositions of n into parts at least d, for any fixed natural number d.  相似文献   

17.
Summary If the expectations of two Poisson distributions are compared by means of the x 2-test, the probability of an error of the first kind is never much larger than the asymptotic value 0.01 or 0.02 or 0.05 on the three most usual levels.  相似文献   

18.
Bartholomew's statistics for testing homogeneity of normal means with ordered alternatives have null distributions which are mixtures of chisquared or beta distributions depending on whether the variances are known or not. The mixing coefficients depend on the sample sizes and the order restriction. If a researcher knows which mean is smallest and which is largest, but does not know how the other means are ordered, then a loop ordering is appropriate. Exact expressions for the mixing coefficients for a loop ordering and arbitrary sample sizes are given for five or fewer populations and approximations are developed for more than five populations. Also, the mixing coefficients for a loop ordering with equal sample sizes are computed. These mixing coefficients also arise in testing the ordering as the null hypothesis, in testing order restrictions in exponential families and in testing order restrictions nonparametrically.This research was supported by the National Institutes of Health under Grant 1 R01 GM42584-01A1  相似文献   

19.
Additive models and tree-based regression models are two main classes of statistical models used to predict the scores on a continuous response variable. It is known that additive models become very complex in the presence of higher order interaction effects, whereas some tree-based models, such as CART, have problems capturing linear main effects of continuous predictors. To overcome these drawbacks, the regression trunk model has been proposed: a multiple regression model with main effects and a parsimonious amount of higher order interaction effects. The interaction effects can be represented by a small tree: a regression trunk. This article proposes a new algorithm—Simultaneous Threshold Interaction Modeling Algorithm (STIMA)—to estimate a regression trunk model that is more general and more efficient than the initial one (RTA) and is implemented in the R-package stima. Results from a simulation study show that the performance of STIMA is satisfactory for sample sizes of 200 or higher. For sample sizes of 300 or higher, the 0.50 SE rule is the best pruning rule for a regression trunk in terms of power and Type I error. For sample sizes of 200, the 0.80 SE rule is recommended. Results from a comparative study of eight regression methods applied to ten benchmark datasets suggest that STIMA and GUIDE are the best performers in terms of cross-validated prediction error. STIMA appeared to be the best method for datasets containing many categorical variables. The characteristics of a regression trunk model are illustrated using the Boston house price dataset.

Supplemental materials for this article, including the R-package stima, are available online.  相似文献   

20.
Rank test statistics for the two-sample problem are based on the sum of the rank scores from either sample. However, a critical difference can occur when approximate scores are used since the sum of the rank scores from sample 1 is not equal to minus the sum of the rank scores from sample 2. By centering and scaling as described in Hajek and Sidak (1967, Theory of Rank Tests, Academic Press, New York) for the uncensored data case the statistics computed from each sample become identical. However such symmetrized approximate scores rank statistics have not been proposed in the censored data case. We propose a statistic that treats the two approximate scores rank statistics in a symmetric manner. Under equal censoring distributions the symmetric rank tests are efficient when the score function corresponds to the underlying model distribution. For unequal censoring distributions we derive a useable expression for the asymptotic variance of our symmetric rank statistics.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号