期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

How many data clusters are in the Galaxy data set?

Grün Bettina Malsiner-Walli Gertraud Frühwirth-Schnatter Sylvia 《Advances in Data Analysis and Classification》2022,16(2):325-349

Advances in Data Analysis and Classification - In model-based clustering, the Galaxy data set is often used as a benchmark data set to study the performance of different modeling approaches. Aitkin... 相似文献

2.

ɛ-Entropy data compression

I. Ya. Tyrygin 《Ukrainian Mathematical Journal》1992,44(11):1473-1479

A method of compressing images by coding is described, in which, at the first stage the value of the -entropy of a class of functions corresponding to sequences of images is computed, and at the second stage suboptimal probabilistic coding is used.Translated from Ukrainskii Matematicheskii Zhurnal, Vol. 44, No. 11, pp. 1598–1604, November, 1992. 相似文献

3.

Factor analysis for paired ranked data with application on parent–child value orientation preference data

Philip L. H. Yu Paul H. Lee W. M. Wan 《Computational Statistics》2013,28(5):1915-1945

Ranking data appear in everyday life and arise in many fields of study such as marketing, psychology and politics. Very often, the key objective of analyzing and modeling ranking data is to identify underlying factors that affect the individuals’ choice behavior. Factor analysis for ranking data is one of the most widely used methods to tackle the aforementioned problem. Recently, Yu et al. [J R Stat Soc Ser A (Statistics in Society) 168:583–597, 2005] have developed factor models for ranked data in which each individual is asked to rank a set of items. However, paired ranked data may arise when the same set of items are ranked by a pair of judges such as a couple in a family. This paper extended the factor model to accommodate such paired ranked data. The Monte Carlo expectation-maximization algorithm was used for parameter estimation, at which the E-step is implemented via the Gibbs Sampler. For model assessment and selection, a tailor-made method called the bootstrap predictive checks approach was proposed. Simulation studies were conducted to illustrate the proposed estimation and model selection method. The proposed method was applied to analyze a parent–child partially ranked data collected from a value priorities survey carried out in the United States. 相似文献

4.

Quantile–DEA classifiers with interval data

Quanling Wei Tsung-Sheng Chang Song Han 《Annals of Operations Research》2014,217(1):535-563

This research intends to develop the classifiers for dealing with binary classification problems with interval data whose difficulty to be tackled has been well recognized, regardless of the field. The proposed classifiers involve using the ideas and techniques of both quantiles and data envelopment analysis (DEA), and are thus referred to as quantile–DEA classifiers. That is, the classifiers first use the concept of quantiles to generate a desired number of exact-data sets from a training-data set comprising interval data. Then, the classifiers adopt the concept and technique of an intersection-form production possibility set in the DEA framework to construct acceptance domains with each corresponding to an exact-data set and thus a quantile. Here, an intersection-form acceptance domain is actually represented by a linear inequality system, which enables the quantile–DEA classifiers to efficiently discover the groups to which large volumes of data belong. In addition, the quantile feature enables the proposed classifiers not only to help reveal patterns, but also to tell the user the value or significance of these patterns. 相似文献

5.

Optimal deployment of?eventually-serializable data services

L. Michel A. Shvartsman E. Sonderegger P. Van Hentenryck 《Annals of Operations Research》2011,184(1):273-294

Providing consistent and fault-tolerant distributed object services is among the fundamental problems in distributed computing. To achieve fault-tolerance and to increase throughput, objects are replicated at different networked nodes. However, replication induces significant communication costs to maintain replica consistency. Eventually-Serializable Data Service (ESDS) has been proposed to reduce these costs and enable fast operations on data, while still providing guarantees that the replicated data will eventually be consistent. This paper reconsiders the deployment phase of ESDS, in which a particular implementation of communicating software components must be mapped onto a physical architecture. This deployment aims at minimizing the overall communication costs, while satisfying the constraints imposed by the protocol. Both MIP (Mixed Integer Programming) and CP (Constraint Programming) models are presented and applied to realistic ESDS instances. The experimental results indicate that both models can find optimal solutions and prove optimality. The CP model, however, provides orders of magnitude improvements in efficiency. The limitations of the MIP model and the critical aspects of the CP model are discussed. Symmetry breaking and parallel computing are also shown to bring significant benefits. 相似文献

6.

Designing an efficient supply chain network with uncertain data: a robust optimization—data envelopment analysis approach

Hashem Omrani Farzane Adabi Narges Adabi 《The Journal of the Operational Research Society》2017,68(7):816-828

Designing a supply chain network (SCN) is an important issue for organizations in competitive markets. In this paper, a novel robust SCN that considers the efficiencies and costs simultaneously is proposed. In order to estimate the efficiency of the producers and distributors, data envelopment analysis (DEA) model is incorporated into SCN. Moreover, to handle the uncertainty in data, a scenario-based robust optimization approach is applied. The proposed model finds out the efficient location of producers and distributors and determines the amount of purchases from each supplier in uncertain conditions. To illustrate the application of the proposed model, a numerical example is solved and results are analyzed. 相似文献

7.

An examination of pre-service mathematics teachers’ ethical reasoning in big data with considerations of access to data

《The Journal of Mathematical Behavior》2023

Implementations of Big Data analysis are reshaping society. The novel ways mathematics operate in society warrants new efforts for mathematics education, both in teaching the new technology and in providing an ethical and critical awareness of its implications. This interview study investigates pre-service teachers' ethical reasoning in data science contexts, focusing on aspects of access to the data that underpin the technology. Findings show that pre-service teachers offer a wide array of ethical arguments related to access to data, that informs their effort to think critically on oppressive situations. However, there is also an indication that their reasoning can be limited by lacking understanding of the related data science methodology, implying that mathematics teacher education should encompass more of this. 相似文献

8.

Sixth grade students’ emerging practices of data modelling

Sibel Kazak Dave Pratt Rukiye Gökce 《ZDM》2018,50(7):1151-1163

相似文献

9.

Reissner–Mindlin plate model with uncertain input data

《Nonlinear Analysis: Real World Applications》2014

A Reissner–Mindlin model of a plate resting on unilateral rigid piers and a unilateral elastic foundation is considered. Since the material coefficients of the orthotropic plate, stiffness of the foundation, and the lateral loading are uncertain, a method of the worst scenario (anti-optimization) is employed to find maximal values of some quantity of interest.The state problem is formulated in terms of a variational inequality with a monotone operator. Using mixed-interpolated finite elements, approximations are proposed for the state problem and for the worst scenario problem. The solvability of the problems and a convergence of approximations is proved. 相似文献

10.

Additive–multiplicative hazards model with current status data

Wanrong Liu Jianglin Fang Xuewen Lu 《Computational Statistics》2018,33(3):1245-1266

The additive–multiplicative hazards (AMH) regression model specifies an additive and multiplicative form on the hazard function for the counting process associated with a multidimensional covariate process, which contains the Cox proportional hazards model and the additive hazards model as its special cases. In this paper, we study the AMH model with current status data, where the cumulative hazard hazard function is assumed to be nonparametric and is estimated using B-splines with monotonicity constraint on the functional, while a simultaneous sieve maximum likelihood estimation is proposed to estimate regression parameters. The proposed estimator for the parameter vector is shown to be asymptotically normal and semiparametric efficient. The B-splines estimator of the functional of the cumulative hazard function is shown to achieve the optimal nonparametric rate of convergence. A simulation study is conducted to examine the finite sample performance of the proposed estimators and algorithm, and a real data example is presented for illustration. 相似文献

11.

Bayesian cylindrical data modeling using Abe–Ley mixtures

《Applied Mathematical Modelling》2019

This paper proposes a Metropolis–Hastings algorithm based on Markov chain Monte Carlo sampling, to estimate the parameters of the Abe–Ley distribution, which is a recently proposed Weibull-Sine-Skewed-von Mises mixture model, for bivariate circular-linear data. Current literature estimates the parameters of these mixture models using the expectation-maximization method, but we will show that this exhibits a few shortcomings for the considered mixture model. First, standard expectation-maximization does not guarantee convergence to a global optimum, because the likelihood is multi-modal, which results from the high dimensionality of the mixture’s likelihood. Second, given that expectation-maximization provides point estimates of the parameters only, the uncertainties of the estimates (e.g., confidence intervals) are not directly available in these methods. Hence, extra calculations are needed to quantify such uncertainty. We propose a Metropolis–Hastings based algorithm that avoids both shortcomings of expectation-maximization. Indeed, Metropolis–Hastings provides an approximation to the complete (posterior) distribution, given that it samples from the joint posterior of the mixture parameters. This facilitates direct inference (e.g., about uncertainty, multi-modality) from the estimation. In developing the algorithm, we tackle various challenges including convergence speed, label switching and selecting the optimum number of mixture components. We then (i) verify the effectiveness of the proposed algorithm on sample datasets with known true parameters, and further (ii) validate our methodology on an environmental dataset (a traditional application domain of Abe–Ley mixtures where measurements are function of direction). Finally, we (iii) demonstrate the usefulness of our approach in an application domain where the circular measurement is periodic in time. 相似文献

12.

Optimal lot disposition from Poisson–Lindley count data

《Applied Mathematical Modelling》2019

Optimal sampling plans based on overdispersed defect counts for screening lots of outgoing and incoming goods are derived by minimizing the required sample size. Best inspection schemes provide appropriate protections to customers and manufacturers. The stochastic distribution of the number of defects per sampled unit is described by Poisson–Lindley models. Optimal frequentist and Bayesian decision rules for lot disposition are found by solving mixed integer nonlinear programming problems through simulation. The suggested criteria are based on likelihood and posterior odds ratios. The asymptotic normality of the quality score statistic is used to deduce explicit and reasonably accurate approximations of the optimal acceptance sampling plans. The Bayesian approach allows the practitioners to reduce the needed sample size for sentencing lots of high-quality products. For illustrative purposes, the proposed methods are applied to the manufacturing of copper wire. 相似文献

13.

Adapting the Warmack–Gonzalez algorithm to handle discrete data

《European Journal of Operational Research》1999,113(3):632-642

Soltysik and Yarnold propose, as a method for two-group multivariate optimal discriminant analysis (MultiODA), selecting a linear discriminant function based on an algorithm by Warmack and Gonzalez. An important assumption underlying the Warmack–Gonzalez algorithm is likely to be violated when the data in the discriminant training samples are discrete, and in particular when they are nominal, causing the algorithm to fail. We offer modest changes to the algorithm that overcome this limitation. 相似文献

14.

LAD estimation for nonlinear regression models with randomly censored data 总被引：3，自引：0，他引：3

ZHOU Xiuqing & WANG Jinde School of Mathematics Computer Science Nanjing Normal University Nanjing China. Department of Mathematics Nanjing University Nanjing China 《中国科学A辑(英文版)》2005,48(7):880-897

The least absolute deviations (LAD) estimation for nonlinear regression models with randomly censored data is studied and the asymptotic properties of LAD estimators such as consistency, boundedness in probability and asymptotic normality are established. Simulation results show that for the problems with censored data, LAD estimation performs much more robustly than the least squares estimation. 相似文献

15.

Optimal linear estimator of origin-destination flows with redundant data

Frédéric Meunier 《Annals of Operations Research》2010,181(1):709-722

Suppose given a network endowed with a multiflow. We want to estimate some quantities connected with this multiflow, for instance the value of an s–t flow for one of the sources–sinks pairs s–t, but only measures on some arcs are available, at least on one s–t cocycle (set of arcs having exactly one endpoint in a subset X of vertices with s∈X and t?X). These measures, supposed to be unbiased, are random variables whose variances are known. How can we combine them optimally in order to get the best estimator of the value of the s–t flow?This question arises in practical situations when the OD matrix of a transportation network must be estimated. We will give a complete answer for the case when we deal with linear combinations, not only for the value of an s–t flow but also for any quantity depending linearly from the multiflow. Interestingly, we will see that the Laplacian matrix of the network plays a central role. 相似文献

16.

Extension of Mood’s median test for survival data

《Statistics & probability letters》2014

Mood’s median test for testing the equality of medians is a nonparametric approach, which has been widely used for uncensored data in practice. For survival data, many nonparametric methods have been proposed to test for the equality of survival curves. However, if the survival medians, rather than the curves, are compared, those methods are not applicable. Some approaches have been developed to fill this gap. Unfortunately, in general those tests have inflated type I error rates, which make them inapplicable to survival data with small sample sizes. In this paper, Mood’s median test for uncensored data is extended for survival data. The results from a comprehensive simulation study show that the proposed test outperforms existing methods in terms of controlling type I error rate and detecting power. 相似文献

17.

A two‐step procedure for clustering time varying data

Katarina Košmelj 《The Journal of mathematical sociology》2013,37(3):315-326

The aim of this paper is to enlarge the usual domain of cluster analysis. A procedure for clustering time varying data is presented which takes into account the time dimension with its intrinsic properties.

This procedure consists of two steps. In the first step a dissimilarity between variables is defined and the dissimilarity matrix is calculated for each unit separately. In the second step the dissimilarity between units is calculated in terms of the dissimilarity matrices defined in the first step. The dissimilarity matrix obtained is the base for a suitable clustering method.

The procedure is illustrated on an empirical example. 相似文献

18.

Stability results for scattered‐data interpolation on Euclidean spheres

F.J. Narcowich N. Sivakumar J.D. Ward 《Advances in Computational Mathematics》1998,8(3):137-163

Let denote the unit sphere in and the geodesic distance in . A spherical‐basis function approximant is a function of the form , where are real constants, is a fixed function, and is a set of distinct points in . It is known that if is a strictly positive definite function in , then the interpolation matrix is positive definite, hence invertible, for every choice of distinct points and every positive integer M. The paper studies a salient subclass of such functions , and provides stability estimates for the associated interpolation matrices. This revised version was published online in June 2006 with corrections to the Cover Date. 相似文献

19.

Airflow patterns for various approximations to velocity—pressure-gradient data

《Applied mathematics and computation》1987,22(1):45-63

The effect of various ways of approximating velocity—pressure-gradient data on the computation of pressure fields in a grain bin is studied. The experimental data are also approximated by a cubic spline. The usual approximating formulas produce differing pressure patterns whenever the plenum pressure is sufficiently high to introduce velocities beyond the measured range. 相似文献

20.

A constrained nonlinear 0–1 program for data allocation

《European Journal of Operational Research》1997,102(3):626-647

This paper analyzes the problem of allocating copies of relations from a global database to the sites of a geographically distributed communication network. The objective of the allocation is to minimize the total cost due to transmissions generated by queries from the various sites, including queries that access multiple relations. This allocation problem is modeled as a constrained nonlinear 0–1 subproblems generated during subgradient optimization are solved as optimization. Some of the unconstrained quadratic 0–1 subproblems generated during subgradient optimization are solved as maximum flow problems, while the others require implicit enumeration, depending on the nature of the objective function coefficients of the subproblems. Our solution approach is tested extensively on data allocation problems with as many as 100 sites and 20 relations. On a set of randomly generated test problems our approach was close to two orders of magnitude faster than the general purpose integer programming code OSL. 相似文献