首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
高质量的决策越来越依赖于高质量的数据挖掘及其分析,高质量的数据挖掘离不开高质量的数据.在大型仪器利用情况调查中,由于主客观因素,总是致使有些数据出现异常,影响数据的质量.这就需要通过适用的方法对异常数据进行检测处理.不同类型数据往往需要不同的异常值检测方法.分析了大型仪器利用情况调查数据的总体特点、一般方法,并以国家科技部平台中心主持的"我国大型仪器资源现状调查"(2009)中大型仪器使用机时和共享机时数据为主线,比较研究了回归方法、基于深度的方法和箱线图方法等对不同类型数据异常值检测的适用性.选取不同角度,检验并采用不同的适用方法,找出相关的可疑异常值,有助于下一步有效开展大型仪器利用情况异常数据的分析处理,提高数据质量,为大型仪器利用情况综合评价奠定基础,也为科技资源调查数据预处理中异常值检测方法提供有益借鉴.  相似文献   

2.
本文针对Weibull分布定时截尾型试验数据提出了一种计算可靠度置信限的方法。通过采用数据填充的方式将不完全数据虚拟成完全数据,利用完全数据情形下可靠度置信限的计算方法得到删失数据情形下可靠度的置信限。模拟研究表明本文提出的算法具有较好的计算稳定性和可操作性。  相似文献   

3.
Operational measurement methods may be developed to measure the value of information in the data reported by the U.S. Government. Illustrative measures for cost of cotton production statistics indicate that the benefits from these data in certain important uses may far exceed their costs. If similar measures could be provided for major Government data programmes, it would facilitate the development of a national data policy that is oriented toward decision making and improvements in economic growth, national well-being, and quality of life. Making these estimates would begin to provide the dollar values in important uses of information in the nation's data bases. These values are needed for allocating resources to maintain, refine, and develop fundamental data series. Cutting data expenditure in the absence of these value measures may be false economy indeed; because, reducing data series with very high benefit/cost ratios might well limit, if not reduce, living standards for many generations to come.Robert R. Nathan Associates, Inc.  相似文献   

4.
Local well-posedness of the Cauchy problem for the noncompact Landau-Lifshitz-Gilbert equation is investigated via the pseudo-stereographic projection. Existence of global solutions is established for small initial data. In the case of one space dimension global existence theorems are proved for large initial data.  相似文献   

5.
Radial basis function (RBF) methods can provide excellent interpolants for a large number of poorly distributed data points. For any finite data set in any Euclidean space, one can construct an interpolation of the data by using RBFs. However, RBF interpolant trends between and beyond the data points depend on the RBF used and may exhibit undesirable trends using some RBFs while the trends may be desirable using other RBFs. The fact that a certain RBF is commonly used for the class of problems at hand, previous good behavior in that (or other) class of problems, and bibliography, are just some of the many valid reasons given to justify a priori selection of RBF. Even assuming that the justified choice of the RBF is most likely the correct choice, one should nonetheless confirm numerically that, in fact, the most adequate RBF for the problem at hand is the RBF chosen a priori. The main goal of this paper is to alert the analyst as to the danger of a priori selection of RBF and to present a strategy to numerically choose the most adequate RBF that better captures the trends of the given data set. The wing weight data fitting problem is used to illustrate the benefits of an adequate choice of RBF for each given data set.  相似文献   

6.
An inverse problem for the wave equation outside an obstacle with a dissipative boundary condition is considered. The observed data are given by a single solution of the wave equation generated by an initial data supported on an open ball. An explicit analytical formula for the computation of the coefficient at a point on the surface of the obstacle, which is nearest to the center of the support of the initial data, is given. Copyright © 2016 John Wiley & Sons, Ltd.  相似文献   

7.
In this paper, the concepts and design for an efficient information service for mathematical software and further mathematical research data are presented. The publication-based approach and the web-based approach are the main building blocks of the service and will be discussed. Heuristic methods are used for identification, extraction, and ranking of information about software and other mathematical research data. The methods provide not only information about the research data but also link software and mathematical research data to the scientific context.  相似文献   

8.
Summary  Increasing amounts of large climate data require new analysis techniques. The area of data mining investigates new paradigms and methods including factors like scalability, flexibility and problem abstraction for large data sets. The field of visual data mining in particular offers valuable methods for analyzing large amounts of data intuitively. In this paper we describe our approach of integrating cluster analysis and visualization methods for the exploration of climate data. We integrated cluster algorithms, appropriate visualization techniques and sophisticated interaction paradigms into a general framework.  相似文献   

9.
企业的历史销售记录是供应链优化研究的基础数据来源,然而,在日常的研究中,几乎所有可以通过公开途径获得的销售记录都是高度不完整的,这为研究者开展工作带来了极大的不便。为解决此问题,本文提出,以销售数据集中已有的数据为基础,使用面向时序数据的矩阵分解模型MAFTIS对其缺失的部分进行估算,从而把残缺的数据集补全完整。进一步地,为提高MAFTIS的计算效率,本文还为该模型设计了一种基于交替最小二乘法的求解策略MAFTISALS。在评估实验中,MAFTISALS被用于三个真实销售数据集的缺失记录估计,结果显示,与其它估计模型相比,MAFTISALS能获得更准确的估计结果,并且具有更高的收敛速度。  相似文献   

10.
The standard tool used in the analysis of geostatistical data is the variogram. When the variogram is applied to lattice data, most commonly the data associated with each region are assumed to have been observed and arbitrarily assigned at the center or centroid of the region. Distances between centroids are then used to develop the spatial covariance structure through the variogram function directly. This arbitrariness of assigning the data to the centroid causes concern because the spatial structure estimated by the variogram depends heavily on the distances between observations. This article investigates what happens to the estimation of the variogram when each lattice value is, in fact, placed randomly within its associated region. We examine the effect that this randomly placed location has on the empirical variogram, the fitted theoretical variogram, and testing for the existence of spatial correlation. Both a regular lattice and an irregular lattice are used for demonstration. In particular, county level summaries of standardized mortality rates for lung, pancreas, and stomach cancer are investigated to see how placing data points randomly throughout the county affects the estimation of the variogram.  相似文献   

11.
A classical problem of stochastic simulation is how to estimate the variance of point estimators, the prototype problem being the sample mean from a steady-state autocorrelated process. A variety of estimators for the variance of the sample mean have been proposed, all designed to provide robustness to violations of assumptions, small variance, and reasonable computing requirements. Evaluation and comparison of such estimators depend on the ability to calculate their variances.A numerical approach is developed here to calculate the dispersion matrix of a set of estimators expressible as quadratic forms of the data. The approach separates the analysis of the estimator type from the analysis of the data type. The analysis for overlapping-batch-means estimators is developed, as is the analysis for steady-state first-order autoregressive and moving-average data. Closed-form expressions for overlapping-batch-means estimators and independently distributed data are obtained.  相似文献   

12.
Bayesian networks (BNs) have attained widespread use in data analysis and decision making. Well-studied topics include efficient inference, evidence propagation, parameter learning from data for complete and incomplete data scenarios, expert elicitation for calibrating BN probabilities, and structure learning. It is common for the researcher to assume the structure of the BN or to glean the structure from expert elicitation or domain knowledge. In this scenario, the model may be calibrated through learning the parameters from relevant data. There is a lack of work on model diagnostics for fitted BNs; this is the contribution of this article. We key on the definition of (conditional) independence to develop a graphical diagnostic that indicates whether the conditional independence assumptions imposed, when one assumes the structure of the BN, are supported by the data. We develop the approach theoretically and describe a Monte Carlo method to generate uncertainty measures for the consistency of the data with conditional independence assumptions under the model structure. We describe how this theoretical information and the data are presented in a graphical diagnostic tool. We demonstrate the approach through data simulated from BNs under different conditional independence assumptions. We also apply the diagnostic to a real-world dataset. The results presented in this article show that this approach is most feasible for smaller BNs—this is not peculiar to the proposed diagnostic graphic, but rather is related to the general difficulty of combining large BNs with data in any manner (such as through parameter estimation). It is the authors’ hope that this article helps highlight the need for more research into BN model diagnostics. This article has supplementary materials online.  相似文献   

13.
The Cauchy problem for the (2+1)-dimensional nonlinear Boiti-Leon-Pempinelli (BLP) equation is studied within the framework of the inverse problem method. Evolution equations generated by the system of BLP equations under study are derived for the resolvent, Jost solutions, and scattering data for the two-dimensional Klein-Gordon differential operator with variable coefficients. Additional conditions on the scattering data that ensure the stability of the solutions to the Cauchy problem are revealed. A recurrence procedure is suggested for constructing the polynomial integrals of motion and the generating function for these integrals in terms of the spectral data.Translated from Teoreticheskaya i Matematicheskaya Fizika, Vol. 109, No. 2, pp. 163–174, November, 1996.  相似文献   

14.
线性模型是经典统计学的基本内容,主要应用于随机数据的建模分析。如何对非随机的模糊或灰色等不分明性数据进行模型构建和统计分析。基于灰色系统理论,在一系列关于灰色统计推断理论的研究基础上,将灰色估计和灰色假设检验等方法拓展到线性模型的参数估计和假设检验中。与经典统计分析方法进行对比,为不分明数据的建模分析提供新的方法。  相似文献   

15.
In collaborative e-commerce environments, interoperation is a prerequisite for data warehouses that are physically scattered along the value chain. Adopting system and information quality as success variables, we argue that what is required for data warehouse refreshment in this context is inherently more complex than the materialized view maintenance problem and we offer an approach that addresses refreshment in a federation of data warehouses. Defining a special kind of materialized views, we propose an open multi-agent architecture for their incremental maintenance while considering referential integrity constraints on source data.  相似文献   

16.
科学地预测疫情发展趋势对疫情防控至关重要.在新时滞动力学模型(TDD-NCP)的基础上,提出基于随机动力学的时滞卷积模型和离散卷积模型,并基于中国疾病预防控制中心的相关研究结果及公开数据以及Wallinga和Lipsitch的工作,反演出COVID-19的重要参数,拟合了武汉及上海市疫情发展趋势.  相似文献   

17.
A recently developed data separation/classification method, called isotonic separation, is applied to breast cancer prediction. Two breast cancer data sets, one with clean and sufficient data and the other with insufficient data, are used for the study and the results are compared against those of decision tree induction methods, linear programming discrimination methods, learning vector quantization, support vector machines, adaptive boosting, and other methods. The experiment results show that isotonic separation is a viable and useful tool for data classification in the medical domain.  相似文献   

18.
In this paper we present a simple dynamization method that preserves the query and storage costs of a static data structure and ensures reasonable update costs. In this method, the majority of data elements are maintained in a single data structure, and the updates are handled using smaller auxiliary data structures. We analyze the query, storage, and amortized update costs for the dynamic version of a static data structure in terms of a functionf, such thatf(n)<n, that bounds the sizes of the auxiliary data structures (wheren is the number of elements in the data structure). The conditions onf for minimal (with respect to asymptotic upper bounds) amortized update costs are then obtained. The proposed method is shown to be particularly suited for the cases where the merging of two data structures is more efficient than building the resultant data structure from scratch. Its effectiveness is illustrated by applying it to a class of data structures that have linear merging cost; this class consists of data structures such as Voronoi diagrams, K-d trees, quadtrees, multiple attribute trees, etc.  相似文献   

19.
This article describes a variety of data analysis problems. The types of data across these problems included free text, parallel text, an image collection, remote sensing imagery, and network packets. A strategy for approaching the analysis of these diverse types of data is described. A key part of the challenge is mapping the analytic results back into the original domain and data setting. Additionally, a common computational bottleneck encountered in each of these problems is diagnosed as analysis tools and algorithms with unbounded memory characteristics. This experience and the analysis suggest a research and development path that could greatly extend the scale of problems that can be addressed with routine data analysis tools. In particular, there are opportunities associated with developing theory and functioning algorithms with favorable memory-usage characteristics, and there are opportunities associated with developing methods and theory for describing the outcomes of analyses for the various types of data.  相似文献   

20.
In this paper we consider the Legendre-based method of Persson and Strang and its modification for smoothing uniformly spaced data. The considered Legendre-based methods are numerically compared with the classical Savitzky-Golay method for filtering uniformly spaced data. We show how the Legendre-based filters can be extended to irregularly spaced data.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号