首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 140 毫秒
1.
Advances in Data Analysis and Classification - In model-based clustering, the Galaxy data set is often used as a benchmark data set to study the performance of different modeling approaches. Aitkin...  相似文献   

2.
A method of compressing images by coding is described, in which, at the first stage the value of the -entropy of a class of functions corresponding to sequences of images is computed, and at the second stage suboptimal probabilistic coding is used.Translated from Ukrainskii Matematicheskii Zhurnal, Vol. 44, No. 11, pp. 1598–1604, November, 1992.  相似文献   

3.
Publicly-available datasets, though useful for education, are often constructed for purposes that are quite different from students’ own. To investigate and model phenomena, then, students must learn how to repurpose the data. This paper reports on an emerging line of research that builds on work in data modeling, exploratory data analysis, and storytelling to examine and support students’ data repurposing. We ask: What opportunities emerge for students to reason about the relationship between data, context, and uncertainty when they repurpose public data to explore questions about their local communities? And, How can these opportunities be supported in classroom instruction and activity design? In two exploratory studies, students were asked to pose questions about their communities, use publicly-available data to investigate those questions, and create visual displays and written stories about their findings. Across both enactments, opportunities for reasoning emerged especially when students worked to reconcile (1) their own knowledge and experiences of the context from which data were collected with details of the data provided; and (2) their different emerging stories about the data with one another. We review how these opportunities unfolded within each enactment at the level of group and classroom, with attention to facilitator support.  相似文献   

4.
This research intends to develop the classifiers for dealing with binary classification problems with interval data whose difficulty to be tackled has been well recognized, regardless of the field. The proposed classifiers involve using the ideas and techniques of both quantiles and data envelopment analysis (DEA), and are thus referred to as quantile–DEA classifiers. That is, the classifiers first use the concept of quantiles to generate a desired number of exact-data sets from a training-data set comprising interval data. Then, the classifiers adopt the concept and technique of an intersection-form production possibility set in the DEA framework to construct acceptance domains with each corresponding to an exact-data set and thus a quantile. Here, an intersection-form acceptance domain is actually represented by a linear inequality system, which enables the quantile–DEA classifiers to efficiently discover the groups to which large volumes of data belong. In addition, the quantile feature enables the proposed classifiers not only to help reveal patterns, but also to tell the user the value or significance of these patterns.  相似文献   

5.
Providing consistent and fault-tolerant distributed object services is among the fundamental problems in distributed computing. To achieve fault-tolerance and to increase throughput, objects are replicated at different networked nodes. However, replication induces significant communication costs to maintain replica consistency. Eventually-Serializable Data Service (ESDS) has been proposed to reduce these costs and enable fast operations on data, while still providing guarantees that the replicated data will eventually be consistent. This paper reconsiders the deployment phase of ESDS, in which a particular implementation of communicating software components must be mapped onto a physical architecture. This deployment aims at minimizing the overall communication costs, while satisfying the constraints imposed by the protocol. Both MIP (Mixed Integer Programming) and CP (Constraint Programming) models are presented and applied to realistic ESDS instances. The experimental results indicate that both models can find optimal solutions and prove optimality. The CP model, however, provides orders of magnitude improvements in efficiency. The limitations of the MIP model and the critical aspects of the CP model are discussed. Symmetry breaking and parallel computing are also shown to bring significant benefits.  相似文献   

6.
Designing a supply chain network (SCN) is an important issue for organizations in competitive markets. In this paper, a novel robust SCN that considers the efficiencies and costs simultaneously is proposed. In order to estimate the efficiency of the producers and distributors, data envelopment analysis (DEA) model is incorporated into SCN. Moreover, to handle the uncertainty in data, a scenario-based robust optimization approach is applied. The proposed model finds out the efficient location of producers and distributors and determines the amount of purchases from each supplier in uncertain conditions. To illustrate the application of the proposed model, a numerical example is solved and results are analyzed.  相似文献   

7.
Implementations of Big Data analysis are reshaping society. The novel ways mathematics operate in society warrants new efforts for mathematics education, both in teaching the new technology and in providing an ethical and critical awareness of its implications. This interview study investigates pre-service teachers' ethical reasoning in data science contexts, focusing on aspects of access to the data that underpin the technology. Findings show that pre-service teachers offer a wide array of ethical arguments related to access to data, that informs their effort to think critically on oppressive situations. However, there is also an indication that their reasoning can be limited by lacking understanding of the related data science methodology, implying that mathematics teacher education should encompass more of this.  相似文献   

8.
9.
This paper proposes a Metropolis–Hastings algorithm based on Markov chain Monte Carlo sampling, to estimate the parameters of the Abe–Ley distribution, which is a recently proposed Weibull-Sine-Skewed-von Mises mixture model, for bivariate circular-linear data. Current literature estimates the parameters of these mixture models using the expectation-maximization method, but we will show that this exhibits a few shortcomings for the considered mixture model. First, standard expectation-maximization does not guarantee convergence to a global optimum, because the likelihood is multi-modal, which results from the high dimensionality of the mixture’s likelihood. Second, given that expectation-maximization provides point estimates of the parameters only, the uncertainties of the estimates (e.g., confidence intervals) are not directly available in these methods. Hence, extra calculations are needed to quantify such uncertainty. We propose a Metropolis–Hastings based algorithm that avoids both shortcomings of expectation-maximization. Indeed, Metropolis–Hastings provides an approximation to the complete (posterior) distribution, given that it samples from the joint posterior of the mixture parameters. This facilitates direct inference (e.g., about uncertainty, multi-modality) from the estimation. In developing the algorithm, we tackle various challenges including convergence speed, label switching and selecting the optimum number of mixture components. We then (i) verify the effectiveness of the proposed algorithm on sample datasets with known true parameters, and further (ii) validate our methodology on an environmental dataset (a traditional application domain of Abe–Ley mixtures where measurements are function of direction). Finally, we (iii) demonstrate the usefulness of our approach in an application domain where the circular measurement is periodic in time.  相似文献   

10.
A Reissner–Mindlin model of a plate resting on unilateral rigid piers and a unilateral elastic foundation is considered. Since the material coefficients of the orthotropic plate, stiffness of the foundation, and the lateral loading are uncertain, a method of the worst scenario (anti-optimization) is employed to find maximal values of some quantity of interest.The state problem is formulated in terms of a variational inequality with a monotone operator. Using mixed-interpolated finite elements, approximations are proposed for the state problem and for the worst scenario problem. The solvability of the problems and a convergence of approximations is proved.  相似文献   

11.
A Bayesian model selection procedure for comparing models subject to inequality and/or equality constraints is proposed. An encompassing prior approach is used, and a general form of the Bayes factor of a constrained model against the encompassing model is derived. A simple estimation method is proposed which can estimate the Bayes factors for all candidate models simultaneously by using one set of samples from the encompassing model. A simulation study and a real data analysis demonstrate performance of the method.  相似文献   

12.
Ranked set sampling (RSS) is a statistical technique that uses auxiliary ranking information of unmeasured sample units in an attempt to select a more representative sample that provides better estimation of population parameters than simple random sampling. However, the use of RSS can be hampered by the fact that a complete ranking of units in each set must be specified when implementing RSS. Recently, to allow ties declared as needed, Frey (Environ Ecol Stat 19(3):309–326, 2012) proposed a modification of RSS, which is to simply break ties at random so that a standard ranked set sample is obtained, and meanwhile record the tie structure for use in estimation. Under this RSS variation, several mean estimators were developed and their performance was compared via simulation, with focus on continuous outcome variables. We extend the work of Frey (2012) to binary outcomes and investigate three nonparametric and three likelihood-based proportion estimators (with/without utilizing tie information), among which four are directly extended from existing estimators and the other two are novel. Under different tie-generating mechanisms, we compare the performance of these estimators and draw conclusions based on both simulation and a data example about breast cancer prevalence. Suggestions are made about the choice of the proportion estimator in general.  相似文献   

13.
The paper proposes a method for project selection under a specific decision situation, where a final selection is guided by two aspects: (1) satisfaction of certain segmentation, policy and/or logical constraints, and (2) assurance that the individual evaluation of the projects is respected to the maximum degree. This approach is somewhat different than the usual portfolio optimization, where combinations of projects are compared without special concern on respecting the project’s ranking. The entire process is implemented in two phases: the projects are first ranked, usually through a multicriteria approach. The obtained complete preorder of the projects is then used in an integer programming module in order to effectively drive the final selection that satisfies the segmentation and/or logical constraints. The innovative part of the proposed approach is the way it overcomes the well-known bias towards low cost projects which is caused by the knapsack formulation commonly used in the integer programming phase. Actually this is the main source of divergence between the final selection and the initial complete preorder of the projects. The proposed method improves an agreement between the final selection of projects obtained from the integer programming model and the ranking obtained from the multicriteria approach.  相似文献   

14.
韩伟一 《运筹与管理》2017,26(11):65-69
本文对文[1]中提出的基于虚拟决策单元的排序方法进行了完善和扩展。首先,根据CCR模型,给出了两类特殊的DEA模型,分别是仅有投入数据的DEA模型和仅有产出数据的DEA模型;其次,基于这两个模型,应用上述方法实现了对仅有投入(或产出)数据的决策单元的排序;第三,给出了排序方法中参数a的计算方法;最后,通过修正排序模型,有效提高了排序方法的计算精度。改进后的排序方法避免了两个决策单元因为相对效率值过小而不能排序的情形,其应用范围也进一步扩大。  相似文献   

15.
In the last decade, the problem of getting a consensus group ranking from all users’ ranking data has received increased attention due to its widespread applications. Previous research solved this problem by consolidating the opinions of all users, thereby obtaining an ordering list of all items that represent the achieved consensus. The weakness of this approach, however, is that it always produces a ranking list of all items, regardless of how many conflicts exist among users. This work rejects the forced agreement of all items. Instead, we define a new concept, maximum consensus sequences, which are the longest ranking lists of items that agree with the majority and disagree only with the minority. Based on this concept, algorithm MCS is developed to determine the maximum consensus sequences from users’ ranking data, and also to identify conflict items that need further negotiation. Extensive experiments are carried out using synthetic data sets, and the results indicate that the proposed method is computationally efficient. Finally, we discuss how the identified consensus sequences and conflict items information can be used in practice.  相似文献   

16.
We propose a new method for density estimation of categorical data. The method implements a non-asymptotic data-driven bandwidth selection rule and provides model sparsity not present in the standard kernel density estimation method. Numerical experiments with a well-known ten-dimensional binary medical data set illustrate the effectiveness of the proposed approach for density estimation, discriminant analysis and classification. Supported by the Australian Research Council, under grant number DP0558957.  相似文献   

17.
广义部分线性模型是广义线性模型和部分线性模型的推广,是一种应用广泛的半参数模型.本文讨论的是该模型在线性协变量和响应变量均存在非随机缺失数据情形下参数的Bayes估计和基于Bayes因子的模型选择问题,在分析过程中,采用了惩罚样条来估计模型中的非参数成分,并建立了Bayes层次模型;为了解决Gibbs抽样过程中因参数高度相关带来的混合性差以及因维数增加导致出现不稳定性的问题,引入了潜变量做为添加数据并应用了压缩Gibbs抽样方法,改进了收敛性;同时,为了避免计算多重积分,利用了M-H算法估计边缘密度函数后计算Bayes因子,为模型的选择比较提供了一种准则.最后,通过模拟和实例验证了所给方法的有效性.  相似文献   

18.
In this paper, a superiority and inferiority ranking (SIR) method is proposed. This new method uses two types of information, the superiority and the inferiority information, to derive two types of flows, the superiority flow and the inferiority flow, by which the set of alternatives are ranked partially or completely. Relationships between the SIR method and some of the classical multiple criteria decision making (MCDM) methods (such as SAW, TOPSIS and PROMETHEE) are explored. It is proved that the SIR method is a significant extension of the well-known PROMETHEE method.  相似文献   

19.
Nonparametric procedures are frequently used to rank order alternatives. Often, information from several data sets must be aggregated to derive an overall ranking. When using nonparametric procedures, Simpson-like paradoxes can occur in which the conclusion drawn from the aggregate ranked data set seems contradictory to the conclusions drawn from the individual data sets. Extending previous results found in the literature for the Kruskal–Wallis test, this paper presents a strict condition that ranked data must satisfy in order to avoid this type of inconsistency when using nonparametric pairwise procedures or Bhapkar’s V procedure to extract an overall ranking. Aggregating ranked data poses further difficulties because there exist numerous ways to combine ranked data sets. This paper illustrates these difficulties and derives an upper bound for the number of possible ways that two ranked data sets can be combined.  相似文献   

20.
This article presents a hybrid model for the multiple criteria decision making problems. The proposed decision model consists of three parts: (i) DEA (data envelopment analysis) is used to provide the best combination on the performance parameters of original data; (ii) By the application of AFS (axiomatic fuzzy set) theory and AHP (analytic hierarchy process) method, the weight of each attribute is calculated and (iii) TOPSIS (technique for order preference by similarity to ideal solution) is applied to provide the ranking order of that best combination based on the weights of attributes. In addition, we also provide the definitely semantic interpretations for the decision results by AFS theory. Specially, the model not only employs the performance parameters from raw data, but also considers the preferences from decision-makers that can make the decision results more reasonable. The proposed model is used for robot selection to verify the proposed model. Using the selection index, the evaluation of alternative robots and the selection of the most appropriate are eventually feasible. Moreover, a numerical example for supplier selection is included to illustrate the application of the model for the newly developed problems.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号