首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
讨论了线性v-支持向量回归机中参数v的意义,并给出了严格的理论证明。利用v-支持向量回归机中ε-不敏感损失函数及参数v的意义,提出一种回归数据中的异常值检测方法。采用线性模型使得该方法不仅速度快而且能处理大规模数据。数值实验证明其具有可行性和有效性。  相似文献   

2.
This paper studies spectral density estimation based on amplitude modulation including missing data as a specific case. A generalized periodogram is introduced and smoothed to give a consistent estimator of the spectral density by running local linear regression smoother. We explore the asymptotic properties of the proposed estimator and its application to time series data with periodic missing. A simple data-driven local bandwidth selection rule is proposed and an algorithm for computing the spectral density estimate is presented. The effectiveness of the proposed method is demonstrated using simulations. The application to outlier detection based on leave-one-out diagnostic is also considered. An illustrative example shows that the proposed diagnostic procedure succeeds in revealing outliers in time series without masking and smearing effects. Supported by Chinese NSF Grants 10001004 and 39930160, and Fellowship of City University of Hong Kong.  相似文献   

3.
异常点诊断是统计学中的经典问题.发现并减少异常点对纳税评估数据分析的影响是一项很有意义的研究.然而,通常的异常点诊断一般采用适用于单峰分布的全局识别方法.借鉴局部域相关积分(Local correlation integral)理论,提出基于非参数密度估计的识别方法.方法适用于多峰分布,能识别局域性质的异常点,对异常点占比较高的样本也有较强的识别能力.基于某市10 920个企业样本,实证分析对比研究了税务局目前使用的和建议的纳税评估方法,结果表明税务局采用的方法有较大的纳税评估风险(误判风险).  相似文献   

4.
Recent advances in the transformation model have made it possible to use this model for analyzing a variety of censored survival data. For inference on the regression parameters, there are semiparametric procedures based on the normal approximation. However, the accuracy of such procedures can be quite low when the censoring rate is heavy. In this paper, we apply an empirical likelihood ratio method and derive its limiting distribution via U-statistics. We obtain confidence regions for the regression parameters and compare the proposed method with the normal approximation based method in terms of coverage probability. The simulation results demonstrate that the proposed empirical likelihood method overcomes the under-coverage problem substantially and outperforms the normal approximation based method. The proposed method is illustrated with a real data example. Finally, our method can be applied to general U-statistic type estimating equations.  相似文献   

5.
There exist many data clustering algorithms, but they can not adequately handle the number of clusters or cluster shapes. Their performance mainly depends on a choice of algorithm parameters. Our approach to data clustering and algorithm does not require the parameter choice; it can be treated as a natural adaptation to the existing structure of distances between data points. The outlier factor introduced by the author specifies a degree of being an outlier for each data point. The outlier factor notion is based on the difference between the frequency distribution of interpoint distances in a given dataset and the corresponding distribution of uniformly distributed points. Then data clusters can be determined by maximizing the outlier factor function. The data points in dataset are divided into clusters according to the attractor regions of local optima. An experimental evaluation of the proposed algorithm shows that the proposed method can identify complex cluster shapes. Key advantages of the approach are: good clustering properties for datasets with comparatively large amount of noise (an additional data points), and an absence of important parameters which adequate choice determines the quality of results.  相似文献   

6.
Fault detection and diagnosis (FDD) is an effective technology to assure the safety and reliability of quadrotor helicopters. However, there are still some unsolved problems in the existing FDD methods, such as the trade-offs between the accuracy and complexity of system models used for FDD, and the rarely explored structure faults in quadrotor helicopters. In this paper, a double-granularity FDD method is proposed based on the hybrid modeling of a quadrotor helicopter which has been developed in authors’ previous work. The hybrid model consists of a prior model and a set of non-parametric models. The coarse-granularity-level FDD is built on the prior model which can isolate the faulty channel(s); while the fine-granularity-level FDD is built on the nonparametric models which can isolate the faulty components in the faulty channel. In both coarse and fine granularity FDD procedures, principal component analysis (PCA) is adopted for online fault detection. Using such a double-granularity scheme, the proposed FDD method has inherent ability in detecting and diagnosing structure faults or failures in quadrotor helicopters. Experimental results conducted on a 3-DOF hover platform can demonstrate the feasibility and effectiveness of the proposed hybrid modeling technique and the hybrid model based FDD method.  相似文献   

7.
常规指数加权移动平均(EWMA)控制图的假设前提是观测数据相互独立,但在实际生产过程中,数据相关违背假设条件。本文首先讨论了序列自相关对常规EWMA控制图的影响,结果表明其检测效能降低。因此,重新估计了平稳过程的σz并在此基础上建立了改进型EWMA控制图。然后运用平均链长比较了改进型EWMA控制图与休哈特图和残差控制图,模拟研究说明当过程非强相关且过程均值发生中小偏移条件下。改进型EWMA控制图的检测效果要优于其他两种控制图。最后,通过一个实例验证了该方法的有效性。  相似文献   

8.
Cluster-based outlier detection   总被引:1,自引:0,他引:1  
Outlier detection has important applications in the field of data mining, such as fraud detection, customer behavior analysis, and intrusion detection. Outlier detection is the process of detecting the data objects which are grossly different from or inconsistent with the remaining set of data. Outliers are traditionally considered as single points; however, there is a key observation that many abnormal events have both temporal and spatial locality, which might form small clusters that also need to be deemed as outliers. In other words, not only a single point but also a small cluster can probably be an outlier. In this paper, we present a new definition for outliers: cluster-based outlier, which is meaningful and provides importance to the local data behavior, and how to detect outliers by the clustering algorithm LDBSCAN (Duan et al. in Inf. Syst. 32(7):978–986, 2007) which is capable of finding clusters and assigning LOF (Breunig et al. in Proceedings of the 2000 ACM SIG MOD International Conference on Manegement of Data, ACM Press, pp. 93–104, 2000) to single points.  相似文献   

9.
In this paper, we propose a novel algorithm to detect the suspicious regions on digital mammograms that based on the Fisher information measure. The proposed algorithm is tested different types and categories of mammograms (fatty, fatty-glandular and dense glandular) within mini-MIAS database (Mammogram Image Analysis Society database (UK)). The proposed method is compared with a different segmentation based information theoretical methods to demonstrate their effectiveness. The experimental results on mammography images showed the effectiveness in the detection of suspicious regions. This study can be a part of developing a computer-aided decision (CAD) system for early detection of breast cancer.  相似文献   

10.
Parameter estimation based on uncertain data represented as belief structures is one of the latest problems in the Dempster–Shafer theory. In this paper, a novel method is proposed for the parameter estimation in the case where belief structures are uncertain and represented as interval-valued belief structures. Within our proposed method, the maximization of likelihood criterion and minimization of estimated parameter’s uncertainty are taken into consideration simultaneously. As an illustration, the proposed method is employed to estimate parameters for deterministic and uncertain belief structures, which demonstrates its effectiveness and versatility.  相似文献   

11.
本文研究了区间数群决策信息的集结方法。基于区间数两两比较的可能度矩阵公式和互补判断矩阵的排序公式,推广了文献[3]提出的不确定型OWGA算子,提出了一种组合不确定型OWGA算子,给出了其在应用过程中的具体步骤,并提出了一种相应的集结群决策信息的方法,最后通过一个算例说明了该方法的有效性与可行性,并与文献[3]的结果作了对比分析。  相似文献   

12.
In ocean transportation, detecting vessel delays in advance or in real time is important for fourth-party logistics (4PL) in order to fulfill the expectations of customers and to help customers reduce delay costs. However, the early detection of vessel delays faces the challenges of numerous uncertainties, including weather conditions, port congestion, booking issues, and route selection. Recently, 4PLs have adopted advanced tracking technologies such as satellite-based automatic identification systems (S-AISs) that produce a vast amount of real-time vessel tracking information, thus providing new opportunities to enhance the early detection of vessel delays. This paper proposes a data-driven method for the early detection of vessel delays: in our new framework of refined case-based reasoning (CBR), real-time S-AIS vessel tracking data are utilized in combination with historical shipping data. The proposed method also provides a process of analyzing the causes of delays by matching the tracking patterns of real-time shipments with those of historical shipping data. Real data examples from a logistics company demonstrate the effectiveness of the proposed method.  相似文献   

13.
针对群评价中的数据质量问题, 从评价专家和评价值两个角度进行了异常数据处理, 提出了基于灰色关联度-云模型的群评价数据质量改进方法。基于改进的云距离模型测算被评价对象云和目标云之间的差距, 采用TOPSIS法进行评价排序。将数据质量改进方法和云距离模型用于区域物流竞争力群评价, 改进了群评价的数据质量, 提高了评价结果的稳定性和代表性。  相似文献   

14.
Dimension reduction in today's vector space based information retrieval system is essential for improving computational efficiency in handling massive amounts of data. A mathematical framework for lower dimensional representation of text data in vector space based information retrieval is proposed using minimization and a matrix rank reduction formula. We illustrate how the commonly used Latent Semantic Indexing based on the Singular Value Decomposition (LSI/SVD) can be derived as a method for dimension reduction from our mathematical framework. Then two new methods for dimension reduction based on the centroids of data clusters are proposed and shown to be more efficient and effective than LSI/SVD when we have a priori information on the cluster structure of the data. Several advantages of the new methods in terms of computational efficiency and data representation in the reduced space, as well as their mathematical properties are discussed.Experimental results are presented to illustrate the effectiveness of our methods on certain classification problems in a reduced dimensional space. The results indicate that for a successful lower dimensional representation of the data, it is important to incorporate a priori knowledge in the dimension reduction algorithms.  相似文献   

15.
This article proposes a new technique for detecting outliers in autoregressive models and identifying the type as either innovation or additive. This technique can be used without knowledge of the true model order, outlier location, or outlier type. Specifically, we perturb an observation to obtain the perturbation size that minimizes the resulting residual sum of squares (SSE). The reduction in the SSE yields outlier detection and identification measures. In addition, the perturbation size can be used to gauge the magnitude of the outlier. Monte Carlo studies and empirical examples are presented to illustrate the performance of the proposed method as well as the impact of outliers on model selection and parameter estimation. We also obtain robust estimators and model selection criteria, which are shown in simulation studies to perform well when large outliers occur.  相似文献   

16.
We consider a coefficient identification problem for a mathematical model with free boundary related to ductal carcinoma in situ (DCIS). This inverse problem aims to determine the nutrient consumption rate from additional measurement data at a boundary point. We first obtain a global‐in‐time uniqueness of our inverse problem. Then based on the optimization method, we present a regularization algorithm to recover the nutrient consumption rate. Finally, our numerical experiment shows the effectiveness of the proposed numerical method.  相似文献   

17.
In this paper, the so-called local likelihood method is suggested for solving the change point problems when the data are distributed as multivariate normal. The detection procedures proposed not only provide strongly consistent estimates for the number and locations of the change points, but also simplify significantly the computation.  相似文献   

18.
本文给出了基于两种相近的主Hessian方向方法的边际坐标检验. 这种检验方法能够非常有效的识别自变量对于回归均值中央子空间的贡献. 此外, 与利用切片逆回归和切片平均方差估计的检验方法不同的是, 本文中主Hessian方向的检验方法可以避免对切片数目的选择. 我们证明了检验统计量在原假设下的渐近分布, 并且通过模拟, 证实了检验的有效性.  相似文献   

19.
自90年代以来,越来越多的生产井和注入井安装了长时井底压力计.关于长时压力数据分析的主要障碍是数据中的噪音和奇异点、巨大的数据容量、不完备的信息以及缺少动态分析工具.本文以小波理论为基础,经过反复试验,寻找到了适合于处理长时压力数据的小波类型;在Chee kin(2001)的线性回归方法与小波分解的基础上,建立了剔除异常点的多重阀值法、剔除噪音的软阀值方法以及确定瞬变值的多步法;油(气)田实例说明了上述方法的有效性.  相似文献   

20.
An adaptive collocation method based upon radial basis functions is presented for the solution of singularly perturbed two-point boundary value problems. Using a multiquadric integral formulation, the second derivative of the solution is approximated by multiquadric radial basis functions. This approach is combined with a coordinate stretching technique. The required variable transformation is accomplished by a conformal mapping, an iterated sine-transformation. A new error indicator function accurately captures the regions of the interval with insufficient resolution. This indicator is used to adaptively add data centres and collocation points. The method resolves extremely thin layers accurately with fairly few basis functions. The proposed adaptive scheme is very robust, and reaches high accuracy even when parameters in our coordinate stretching technique are not chosen optimally. The effectiveness of our new method is demonstrated on two examples with boundary layers, and one example featuring an interior layer. It is shown in detail how the adaptive method refines the resolution.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号