首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 125 毫秒
1.
针对区间数多指标群决策问题,提出一种基于集值统计模型的改进灰靶决策方法。首先利用集值统计模型对多专家的区间评价进行估计,得到符合可信度要求的决策指标样本矩阵。然后利用基于加权广义马氏距离的灰靶决策方法对决策方案进行排序,给出决策样本为区间数群决策矩阵形式的灰靶决策模型。最后通过一个具体的算例给出决策方法的过程,避免了马氏距离不存在的情况,克服了决策指标间的相关性、重要性差异和不同量纲对决策过程和决策结果的影响,方法的可行性与有效性得到验证。  相似文献   

2.
马田系统是以马氏距离为测量尺度,通过选取正常样本构建马氏空间,对多元系统进行诊断和预测的分类技术。马氏距离对样本数据的变化非常敏感,因此用于构建马氏空间的正常样本的数据质量直接影响到分类的准确率。实际应用中正常样本的选取大多依据主观经验判断,缺乏客观规范的选择机制。本文提出基于控制图的马氏空间生成机理,先由专家选取的正常样本构建初始马氏空间,再以每个正常样品在初始马氏空间和对应的缩减马氏空间上的马氏距离增量作为新的测量尺度,以此建立单值控制图,利用控制图稳定性判定规则剔除异常数据,从而得到稳定状态的马氏空间。实验分析结果表明该方法的有效性且提高了马田系统分类的准确率。  相似文献   

3.
支持向量机(support vector machine,SVM)方法是建立在统计学习理论的VC(Vapnik-Chervonenkis)维理论和结构风险最小原理基础上的,根据有限的样本信息在模型的复杂性(对特定训练样本的学习精度)和学习能力(无错误地识别任意样本的能力)之间寻求最佳折衷,以求获得最好的推广能力.基本原理是,以二维数据为例,如果训练数据分布在二维平面上的点,它们按照其分类聚集在不同的区域.通过训练,找到这些分类之间的边界.利用SVM方法,针对上海期货交易所挂牌交易的2011年期货铜主力合约500 ms每tick的高频数据进行分析.分析结果表明,SVM方法可以取得较好的预测效果.  相似文献   

4.
应用支持向量机(SVM)的算法进行中国大豆产量的预测研究,用1991-2008年中国大豆数据组成样本集,建立影响因素与大豆产量之间的SVM模型.利用SVM对输入和输出数据进行训练学习,逼近历史数据所隐含的函数关系,完成对新数据序列的映射关系,从而完成对未来年份大豆的预测,并与其它几种方法的预测效果进行比较.结果表明,SVM预测模型预测大豆产量的精度优于其它预测方法.  相似文献   

5.
运用支持向量机对车牌字符进行识别,解决了由于图像受客观条件的影响、样本数量不是很大等原因导致的识别率不高的问题.主要针对车牌字符中的数字进行实验,选取了15组数字样本,8组进行训练,7组进行测试,采用交叉验证的思想对SVM进行参数C与g的寻优,并选择合适的核函数,对样本进行训练和预测,对于某些数字的识别率可达到100%,并在相同的训练集和测试集下与BP网络的识别效果进行对比.实验结果表明,SVM在训练样本较少且无字符特征提取的情况下具有很好的识别率,并且有很好的分类推广能力.  相似文献   

6.
对区间型符号数据进行特征选择,可以降低数据的维数,提取数据的关键特征。针对区间型符号数据的特征选择问题,本文提出了一种新的特征选择方法。首先,该方法使用区间数Hausdorff距离和区间数欧氏距离度量区间数的相似性,通过建立使得样本点与样本类中心相似性最大的优化模型来估计区间型符号数据的特征权重。其次,基于特征权重构建相应的分类器来评价所估计特征权重的优劣。最后,为了验证本文方法的有效性,分别在人工生成数据集和真实数据集上进行了数值实验,数值实验结果表明,本文方法可以有效地去除无关特征,识别出与类标号有关的特征。  相似文献   

7.
受推荐系统在电子商务领域重大经济利益的驱动,恶意用户以非法牟利为目的实施托攻击,操纵改变推荐结果,使推荐系统面临严峻的信息安全威胁,如何识别和检测托攻击成为保障推荐系统信息安全的关键。传统支持向量机(SVM)方法同时受到小样本和数据不均衡两个问题的制约。为此,提出一种半监督SVM和非对称集成策略相结合的托攻击检测方法。首先训练初始SVM,然后引入K最近邻法优化分类面附近样本的标记质量,利用标记数据和未标记数据的混合样本集减少对标记数据的需求。最后,设计一种非对称加权集成策略,重点关注攻击样本的分类准确率,降低集成分类器对数据不均衡的敏感性。实验结果表明,本文方法有效地解决了小样本问题和数据不均衡分布问题,获得了较好的检测效果。  相似文献   

8.
在海量征信数据的背景下,为降低缺失数据插补的计算成本,提出收缩近邻插补方法.收缩近邻方法通过三阶段完成数据插补,第一阶段基于样本和变量的缺失比例计算入样概率,通过不等概抽样完成数据的收缩,第二阶段基于样本间距离,选取与缺失样本近邻的样本组成训练集,第三阶段建立随机森林模型进行迭代插补.利用Australian数据集和中国各银行数据集进行模拟研究,结果表明在确保一定插补精度的情况下,收缩近邻方法较大程度减少了计算量.  相似文献   

9.
基于Shadowed Sets理论研究了粗糙集连续属性离散化问题,提出一种新的基于Shadowed Sets 理论的候选断点集提取算法.该算法根据实例在单属性上的分布,对数据样本进行分类,采用Shadowed Sets计算出各类的上下近似,最终提取出候选断点集.使用多组UCI数据对此算法的性能进行检验,同时还与其它候选断点集提取算法做了对比实验.实验结果表明,此算法能有效地减少数据集候选断点的数目,提高离散化算法运行速度和识别率.  相似文献   

10.
王正新 《经济数学》2012,29(2):17-20
针对决策指标之间的相关性问题,将马氏距离引入传统TOPSIS方法,提出了基于马氏距离的TOPSIS方法.在此基础上,分析了基于马氏距离改进后贴近度的性质,并以投资决策方案选择为例加以说明.结果表明,基于马氏距离改进的TOPSIS方法对决策数据的非奇异线性变换具有不变性.协方差矩阵体现了决策指标之间的相关性,因而可以有效避免指标的相关性对决策效果的影响.  相似文献   

11.
江苏作为"一带一路"战略的交汇点,有必要探究其各地区外向型经济发展能力.马氏距离具备消除指标间的相关性且不受量纲影响,代替TOPSIS中的欧氏距离,运用灰色关联度来判断指标的关联性,建立基于马氏距离、灰色关联度的TOPSIS外向型经济发展能力评价模型.以江苏13个地级城市为研究对象,进行实证研究.研究表明,外向型经济发展能力评价模型有助于综合判断各城市外向经济发展能力,发现短板并加以整改,促进"一带一路"建设.  相似文献   

12.
Summary  The problem of detection of multidimensional outliers is a fundamental and important problem in applied statistics. The unreliability of multivariate outlier detection techniques such as Mahalanobis distance and hat matrix leverage has led to development of techniques which have been known in the statistical community for well over a decade. The literature on this subject is vast and growing. In this paper, we propose to use the artificial intelligence technique ofself-organizing map (SOM) for detecting multiple outliers in multidimensional datasets. SOM, which produces a topology-preserving mapping of the multidimensional data cloud onto lower dimensional visualizable plane, provides an easy way of detection of multidimensional outliers in the data, at respective levels of leverage. The proposed SOM based method for outlier detection not only identifies the multidimensional outliers, it actually provides information about the entire outlier neighbourhood. Being an artificial intelligence technique, SOM based outlier detection technique is non-parametric and can be used to detect outliers from very large multidimensional datasets. The method is applied to detect outliers from varied types of simulated multivariate datasets, a benchmark dataset and also to real life cheque processing dataset. The results show that SOM can effectively be used as a useful technique for multidimensional outlier detection.  相似文献   

13.
It is shown that the accuracy of chromosome classification constrained by class size can be improved over previously reported results by a combination of straightforward modifications to previously used methods. These are (i) the use of the logarithm of the Mahalanobis distance of an unknown chromosome's feature vector to estimated class mean vectors as the basis of the transportation method objective function, rather than the estimated likelihood; (ii) the use of all available features and full estimated covariance to compute the Mahalanobis distance, rather than a subset of features and the diagonal (variance) terms only; (iii) a modification to the way the transportation model deals with the constraint on the number of sex chromosomes in a metaphase cell; and (iv) the use of a newly discovered heuristic to weight off-diagonal elements of the covariance; this proved to be particularly valuable in cases where relatively few training examples were available to estimate covariance. The methods have been verified using 5 different sets of chromosome data.  相似文献   

14.
提出了一种新的证据融合逼近算法,它利用焦元信息熵选择参与合成的焦元.为避免过多的焦元被抛弃,提出保留焦元的黄金比例.然后将被抛弃焦元的基本概率指派值按照受益程度重新分配到保留焦元或识别框架.实例表明该方法不但能够加快证据融合的运算速度,而且能克服为了融合速度而牺牲精度的缺点,保证融合结果的准确性.  相似文献   

15.
马氏距离聚类分析中协方差矩阵估算的改进   总被引:1,自引:0,他引:1  
本文考虑了变量权重和样本类别的影响,建立了马氏距离聚类过程中评估协方差矩阵的迭代法。以Fisher的iris数据为样本,运用欧氏距离一般聚类、主成分聚类、改进前后的马氏距离聚类方法,进行实证分析和比较,结果表明本文所提出的新方法准确率至少提高了6.63%。最后,运用该方法对35个国家的相关指标数据进行聚类分析,确定了各国的卫生保健状况等级。  相似文献   

16.
Gaussian graphical models (GGMs) are popular for modeling high-dimensional multivariate data with sparse conditional dependencies. A mixture of GGMs extends this model to the more realistic scenario where observations come from a heterogenous population composed of a small number of homogeneous subgroups. In this article, we present a novel stochastic search algorithm for finding the posterior mode of high-dimensional Dirichlet process mixtures of decomposable GGMs. Further, we investigate how to harness the massive thread-parallelization capabilities of graphical processing units to accelerate computation. The computational advantages of our algorithms are demonstrated with various simulated data examples in which we compare our stochastic search with a Markov chain Monte Carlo (MCMC) algorithm in moderate dimensional data examples. These experiments show that our stochastic search largely outperforms the MCMC algorithm in terms of computing-times and in terms of the quality of the posterior mode discovered. Finally, we analyze a gene expression dataset in which MCMC algorithms are too slow to be practically useful.  相似文献   

17.
Time series classification by class-specific Mahalanobis distance measures   总被引:1,自引:0,他引:1  
To classify time series by nearest neighbors, we need to specify or learn one or several distance measures. We consider variations of the Mahalanobis distance measures which rely on the inverse covariance matrix of the data. Unfortunately??for time series data??the covariance matrix has often low rank. To alleviate this problem we can either use a pseudoinverse, covariance shrinking or limit the matrix to its diagonal. We review these alternatives and benchmark them against competitive methods such as the related Large Margin Nearest Neighbor Classification (LMNN) and the Dynamic Time Warping (DTW) distance. As we expected, we find that the DTW is superior, but the Mahalanobis distance measures are one to two orders of magnitude faster. To get best results with Mahalanobis distance measures, we recommend learning one distance measure per class using either covariance shrinking or the diagonal approach.  相似文献   

18.
The canonical correlation (CANCOR) method for dimension reduction in a regression setting is based on the classical estimates of the first and second moments of the data, and therefore sensitive to outliers. In this paper, we study a weighted canonical correlation (WCANCOR) method, which captures a subspace of the central dimension reduction subspace, as well as its asymptotic properties. In the proposed WCANCOR method, each observation is weighted based on its Mahalanobis distance to the location of the predictor distribution. Robust estimates of the location and scatter, such as the minimum covariance determinant (MCD) estimator of Rousseeuw [P.J. Rousseeuw, Multivariate estimation with high breakdown point, Mathematical Statistics and Applications B (1985) 283-297], can be used to compute the Mahalanobis distance. To determine the number of significant dimensions in WCANCOR, a weighted permutation test is considered. A comparison of SIR, CANCOR and WCANCOR is also made through simulation studies to show the robustness of WCANCOR to outlying observations. As an example, the Boston housing data is analyzed using the proposed WCANCOR method.  相似文献   

19.
Clustering is often useful for analyzing and summarizing information within large datasets. Model-based clustering methods have been found to be effective for determining the number of clusters, dealing with outliers, and selecting the best clustering method in datasets that are small to moderate in size. For large datasets, current model-based clustering methods tend to be limited by memory and time requirements and the increasing difficulty of maximum likelihood estimation. They may fit too many clusters in some portions of the data and/or miss clusters containing relatively few observations. We propose an incremental approach for data that can be processed as a whole in memory, which is relatively efficient computationally and has the ability to find small clusters in large datasets. The method starts by drawing a random sample of the data, selecting and fitting a clustering model to the sample, and extending the model to the full dataset by additional EM iterations. New clusters are then added incrementally, initialized with the observations that are poorly fit by the current model. We demonstrate the effectiveness of this method by applying it to simulated data, and to image data where its performance can be assessed visually.  相似文献   

20.
本文研究了铁磁性设备周围空间传感器布阵的问题。我们建立了关于传感器位置和数量优化的数学模型,并通过遗传算法对模型进行求解。首先,本文选用对传感器数量和距离要求较少的旋转椭球体作为磁场远场换算的模型。在旋转椭球体模型中,传感器分布位置不当会导致磁场计算系数矩阵的条件数过大,模型将出现病态,因而计算得到的远场磁场结果不可靠。所以,本文以旋转椭球体模型中的系数矩阵条件数为优化目标,建立数学模型优化单个设备上方传感器的数量与位置分布,并利用遗传算法对模型求解。其次,通过实验验证了本模型对于单个设备的传感器位置和数量优化是有效的,且所用传感器数量少,计算结果可靠。最后,将单个设备传感器位置和数量的优化模型推广到多个设备,以两个设备为代表用同时优化和分别优化两种方法计算传感器位置,根据实验计算这两种方法都具有较高的远场磁场计算精度,但分开优化的方法在实际计算更加简便、容易操作。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号