首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 500 毫秒
1.
Advances in Data Analysis and Classification - In model-based clustering, the Galaxy data set is often used as a benchmark data set to study the performance of different modeling approaches. Aitkin...  相似文献   

2.
A method of compressing images by coding is described, in which, at the first stage the value of the -entropy of a class of functions corresponding to sequences of images is computed, and at the second stage suboptimal probabilistic coding is used.Translated from Ukrainskii Matematicheskii Zhurnal, Vol. 44, No. 11, pp. 1598–1604, November, 1992.  相似文献   

3.
Ranking data appear in everyday life and arise in many fields of study such as marketing, psychology and politics. Very often, the key objective of analyzing and modeling ranking data is to identify underlying factors that affect the individuals’ choice behavior. Factor analysis for ranking data is one of the most widely used methods to tackle the aforementioned problem. Recently, Yu et al. [J R Stat Soc Ser A (Statistics in Society) 168:583–597, 2005] have developed factor models for ranked data in which each individual is asked to rank a set of items. However, paired ranked data may arise when the same set of items are ranked by a pair of judges such as a couple in a family. This paper extended the factor model to accommodate such paired ranked data. The Monte Carlo expectation-maximization algorithm was used for parameter estimation, at which the E-step is implemented via the Gibbs Sampler. For model assessment and selection, a tailor-made method called the bootstrap predictive checks approach was proposed. Simulation studies were conducted to illustrate the proposed estimation and model selection method. The proposed method was applied to analyze a parent–child partially ranked data collected from a value priorities survey carried out in the United States.  相似文献   

4.
This research intends to develop the classifiers for dealing with binary classification problems with interval data whose difficulty to be tackled has been well recognized, regardless of the field. The proposed classifiers involve using the ideas and techniques of both quantiles and data envelopment analysis (DEA), and are thus referred to as quantile–DEA classifiers. That is, the classifiers first use the concept of quantiles to generate a desired number of exact-data sets from a training-data set comprising interval data. Then, the classifiers adopt the concept and technique of an intersection-form production possibility set in the DEA framework to construct acceptance domains with each corresponding to an exact-data set and thus a quantile. Here, an intersection-form acceptance domain is actually represented by a linear inequality system, which enables the quantile–DEA classifiers to efficiently discover the groups to which large volumes of data belong. In addition, the quantile feature enables the proposed classifiers not only to help reveal patterns, but also to tell the user the value or significance of these patterns.  相似文献   

5.
Providing consistent and fault-tolerant distributed object services is among the fundamental problems in distributed computing. To achieve fault-tolerance and to increase throughput, objects are replicated at different networked nodes. However, replication induces significant communication costs to maintain replica consistency. Eventually-Serializable Data Service (ESDS) has been proposed to reduce these costs and enable fast operations on data, while still providing guarantees that the replicated data will eventually be consistent. This paper reconsiders the deployment phase of ESDS, in which a particular implementation of communicating software components must be mapped onto a physical architecture. This deployment aims at minimizing the overall communication costs, while satisfying the constraints imposed by the protocol. Both MIP (Mixed Integer Programming) and CP (Constraint Programming) models are presented and applied to realistic ESDS instances. The experimental results indicate that both models can find optimal solutions and prove optimality. The CP model, however, provides orders of magnitude improvements in efficiency. The limitations of the MIP model and the critical aspects of the CP model are discussed. Symmetry breaking and parallel computing are also shown to bring significant benefits.  相似文献   

6.
Designing a supply chain network (SCN) is an important issue for organizations in competitive markets. In this paper, a novel robust SCN that considers the efficiencies and costs simultaneously is proposed. In order to estimate the efficiency of the producers and distributors, data envelopment analysis (DEA) model is incorporated into SCN. Moreover, to handle the uncertainty in data, a scenario-based robust optimization approach is applied. The proposed model finds out the efficient location of producers and distributors and determines the amount of purchases from each supplier in uncertain conditions. To illustrate the application of the proposed model, a numerical example is solved and results are analyzed.  相似文献   

7.
Implementations of Big Data analysis are reshaping society. The novel ways mathematics operate in society warrants new efforts for mathematics education, both in teaching the new technology and in providing an ethical and critical awareness of its implications. This interview study investigates pre-service teachers' ethical reasoning in data science contexts, focusing on aspects of access to the data that underpin the technology. Findings show that pre-service teachers offer a wide array of ethical arguments related to access to data, that informs their effort to think critically on oppressive situations. However, there is also an indication that their reasoning can be limited by lacking understanding of the related data science methodology, implying that mathematics teacher education should encompass more of this.  相似文献   

8.
9.
A Reissner–Mindlin model of a plate resting on unilateral rigid piers and a unilateral elastic foundation is considered. Since the material coefficients of the orthotropic plate, stiffness of the foundation, and the lateral loading are uncertain, a method of the worst scenario (anti-optimization) is employed to find maximal values of some quantity of interest.The state problem is formulated in terms of a variational inequality with a monotone operator. Using mixed-interpolated finite elements, approximations are proposed for the state problem and for the worst scenario problem. The solvability of the problems and a convergence of approximations is proved.  相似文献   

10.
The additive–multiplicative hazards (AMH) regression model specifies an additive and multiplicative form on the hazard function for the counting process associated with a multidimensional covariate process, which contains the Cox proportional hazards model and the additive hazards model as its special cases. In this paper, we study the AMH model with current status data, where the cumulative hazard hazard function is assumed to be nonparametric and is estimated using B-splines with monotonicity constraint on the functional, while a simultaneous sieve maximum likelihood estimation is proposed to estimate regression parameters. The proposed estimator for the parameter vector is shown to be asymptotically normal and semiparametric efficient. The B-splines estimator of the functional of the cumulative hazard function is shown to achieve the optimal nonparametric rate of convergence. A simulation study is conducted to examine the finite sample performance of the proposed estimators and algorithm, and a real data example is presented for illustration.  相似文献   

11.
This paper proposes a Metropolis–Hastings algorithm based on Markov chain Monte Carlo sampling, to estimate the parameters of the Abe–Ley distribution, which is a recently proposed Weibull-Sine-Skewed-von Mises mixture model, for bivariate circular-linear data. Current literature estimates the parameters of these mixture models using the expectation-maximization method, but we will show that this exhibits a few shortcomings for the considered mixture model. First, standard expectation-maximization does not guarantee convergence to a global optimum, because the likelihood is multi-modal, which results from the high dimensionality of the mixture’s likelihood. Second, given that expectation-maximization provides point estimates of the parameters only, the uncertainties of the estimates (e.g., confidence intervals) are not directly available in these methods. Hence, extra calculations are needed to quantify such uncertainty. We propose a Metropolis–Hastings based algorithm that avoids both shortcomings of expectation-maximization. Indeed, Metropolis–Hastings provides an approximation to the complete (posterior) distribution, given that it samples from the joint posterior of the mixture parameters. This facilitates direct inference (e.g., about uncertainty, multi-modality) from the estimation. In developing the algorithm, we tackle various challenges including convergence speed, label switching and selecting the optimum number of mixture components. We then (i) verify the effectiveness of the proposed algorithm on sample datasets with known true parameters, and further (ii) validate our methodology on an environmental dataset (a traditional application domain of Abe–Ley mixtures where measurements are function of direction). Finally, we (iii) demonstrate the usefulness of our approach in an application domain where the circular measurement is periodic in time.  相似文献   

12.
Optimal sampling plans based on overdispersed defect counts for screening lots of outgoing and incoming goods are derived by minimizing the required sample size. Best inspection schemes provide appropriate protections to customers and manufacturers. The stochastic distribution of the number of defects per sampled unit is described by Poisson–Lindley models. Optimal frequentist and Bayesian decision rules for lot disposition are found by solving mixed integer nonlinear programming problems through simulation. The suggested criteria are based on likelihood and posterior odds ratios. The asymptotic normality of the quality score statistic is used to deduce explicit and reasonably accurate approximations of the optimal acceptance sampling plans. The Bayesian approach allows the practitioners to reduce the needed sample size for sentencing lots of high-quality products. For illustrative purposes, the proposed methods are applied to the manufacturing of copper wire.  相似文献   

13.
Soltysik and Yarnold propose, as a method for two-group multivariate optimal discriminant analysis (MultiODA), selecting a linear discriminant function based on an algorithm by Warmack and Gonzalez. An important assumption underlying the Warmack–Gonzalez algorithm is likely to be violated when the data in the discriminant training samples are discrete, and in particular when they are nominal, causing the algorithm to fail. We offer modest changes to the algorithm that overcome this limitation.  相似文献   

14.
LAD estimation for nonlinear regression models with randomly censored data   总被引:3,自引:0,他引:3  
The least absolute deviations (LAD) estimation for nonlinear regression models with randomly censored data is studied and the asymptotic properties of LAD estimators such as consistency, boundedness in probability and asymptotic normality are established. Simulation results show that for the problems with censored data, LAD estimation performs much more robustly than the least squares estimation.  相似文献   

15.
Suppose given a network endowed with a multiflow. We want to estimate some quantities connected with this multiflow, for instance the value of an st flow for one of the sources–sinks pairs st, but only measures on some arcs are available, at least on one st cocycle (set of arcs having exactly one endpoint in a subset X of vertices with sX and t?X). These measures, supposed to be unbiased, are random variables whose variances are known. How can we combine them optimally in order to get the best estimator of the value of the st flow?This question arises in practical situations when the OD matrix of a transportation network must be estimated. We will give a complete answer for the case when we deal with linear combinations, not only for the value of an st flow but also for any quantity depending linearly from the multiflow. Interestingly, we will see that the Laplacian matrix of the network plays a central role.  相似文献   

16.
Mood’s median test for testing the equality of medians is a nonparametric approach, which has been widely used for uncensored data in practice. For survival data, many nonparametric methods have been proposed to test for the equality of survival curves. However, if the survival medians, rather than the curves, are compared, those methods are not applicable. Some approaches have been developed to fill this gap. Unfortunately, in general those tests have inflated type I error rates, which make them inapplicable to survival data with small sample sizes. In this paper, Mood’s median test for uncensored data is extended for survival data. The results from a comprehensive simulation study show that the proposed test outperforms existing methods in terms of controlling type I error rate and detecting power.  相似文献   

17.
The aim of this paper is to enlarge the usual domain of cluster analysis. A procedure for clustering time varying data is presented which takes into account the time dimension with its intrinsic properties.

This procedure consists of two steps. In the first step a dissimilarity between variables is defined and the dissimilarity matrix is calculated for each unit separately. In the second step the dissimilarity between units is calculated in terms of the dissimilarity matrices defined in the first step. The dissimilarity matrix obtained is the base for a suitable clustering method.

The procedure is illustrated on an empirical example.  相似文献   

18.
Let denote the unit sphere in and the geodesic distance in . A spherical‐basis function approximant is a function of the form , where are real constants, is a fixed function, and is a set of distinct points in . It is known that if is a strictly positive definite function in , then the interpolation matrix is positive definite, hence invertible, for every choice of distinct points and every positive integer M. The paper studies a salient subclass of such functions , and provides stability estimates for the associated interpolation matrices. This revised version was published online in June 2006 with corrections to the Cover Date.  相似文献   

19.
The effect of various ways of approximating velocity—pressure-gradient data on the computation of pressure fields in a grain bin is studied. The experimental data are also approximated by a cubic spline. The usual approximating formulas produce differing pressure patterns whenever the plenum pressure is sufficiently high to introduce velocities beyond the measured range.  相似文献   

20.
This paper analyzes the problem of allocating copies of relations from a global database to the sites of a geographically distributed communication network. The objective of the allocation is to minimize the total cost due to transmissions generated by queries from the various sites, including queries that access multiple relations. This allocation problem is modeled as a constrained nonlinear 0–1 subproblems generated during subgradient optimization are solved as optimization. Some of the unconstrained quadratic 0–1 subproblems generated during subgradient optimization are solved as maximum flow problems, while the others require implicit enumeration, depending on the nature of the objective function coefficients of the subproblems. Our solution approach is tested extensively on data allocation problems with as many as 100 sites and 20 relations. On a set of randomly generated test problems our approach was close to two orders of magnitude faster than the general purpose integer programming code OSL.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号