首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 125 毫秒
1.
区间型符号数据是一种重要的符号数据类型,现有文献往往假设区间内的点数据服从均匀分布,导致其应用的局限性。本文基于一般分布的假设,给出了一般分布区间型符号数据的扩展的Hausdorff距离度量,基于此提出了一般分布的区间型符号数据的SOM聚类算法。随机模拟试验的结果表明,基于本文提出的基于扩展的Hausdorff距离度量的SOM聚类算法的有效性优于基于传统Hausdorff距离度量的SOM聚类算法和基于μσ距离度量的SOM聚类算法。最后将文中方法应用于气象数据的聚类分析,示例文中方法的应用步骤与可操作性,并进一步评价文中方法在解决实际问题中的有效性。  相似文献   

2.
采用统计检验的方法对基因表达数据的特征选取和冗余去除展开研究,为此提出了相应模型及算法,与已有文献中的模型与算法相比较,该模型所提方法思路直观,易于理解,算法构造简单,且运行效率高.数值实验选取3个两分类基因表达数据集,实验结果表明该方法对特征选取和冗余去除均有较好的效果.在此基础上,采用类中心距离法对选取的特征基因进行了分类实验,结果进一步表明,本文提出的方法对两分类基因表达数据具有较高的分类精确度.  相似文献   

3.
针对传统的谱聚类算法不适合处理多尺度问题,引入一种新的相似性度量—密度敏感的相似性度量,该度量可以放大不同高密度区域内数据点间距离,缩短同一高密度区域内数据点间距离,最终有效描述数据的实际聚类分布.本文引入特征间隙的概念,给出一种自动确定聚类数目的方法.数值实验验证本文所提的算法的可行性和有效性.  相似文献   

4.
在给定的度量空间中, 单位聚类问题就是寻找最少的单位球来覆盖给定的所有点。这是一个众所周知的组合优化问题, 其在线版本为: 给定一个度量空间, 其中的n个点会一个接一个的到达任何可能的位置, 在点到达的时候必须给该点分配一个单位聚类, 而此时未来点的相关信息都是未知的, 问题的目标是最后使用的单位聚类数目最少。本文考虑的是带如下假设的一类一维在线单位聚类问题: 在相应离线问题的最优解中任意两个相邻聚类之间的距离都大于0.5。本文首先给出了两个在线算法和一些引理, 接着通过0.5的概率分别运行两个在线算法得到一个组合随机算法, 最后证明了这个组合随机算法的期望竞争比不超过1.5。  相似文献   

5.
在给定的度量空间中, 单位聚类问题就是寻找最少的单位球来覆盖给定的所有点。这是一个众所周知的组合优化问题, 其在线版本为: 给定一个度量空间, 其中的n个点会一个接一个的到达任何可能的位置, 在点到达的时候必须给该点分配一个单位聚类, 而此时未来点的相关信息都是未知的, 问题的目标是最后使用的单位聚类数目最少。本文考虑的是带如下假设的一类一维在线单位聚类问题: 在相应离线问题的最优解中任意两个相邻聚类之间的距离都大于0.5。本文首先给出了两个在线算法和一些引理, 接着通过0.5的概率分别运行两个在线算法得到一个组合随机算法, 最后证明了这个组合随机算法的期望竞争比不超过1.5。  相似文献   

6.
在本文中,首先利用区间数的EW-型度量探讨了模糊数空间上的积分度量问题,给出了模糊数空间上的一种新的积分度量-EW-型积分度量,并证明了其相关性质.其次,作为EW-型积分度量的应用,设计了对属性特征为三角模糊数的事物进行分类的模糊聚类算法.然后通过实例分析,说明了EW-型积分度量使模糊聚类算法实现的更简单易行,分类更加精细,合理有效等。  相似文献   

7.
针对采用经典划分思想的聚类算法以一个点来代表类的局限,提出一种基于泛化中心的分类属性数据聚类算法。该算法通过定义包含多个点的泛化中心来代表类,能够体现出类的数据分布特征,并进一步提出泛化中心距离及类间距离度量的新方法,给出泛化中心的确定方法及基于泛化中心进行对象到类分配的聚类策略,一般只需一次划分迭代就能得到最终聚类结果。将泛化中心算法应用到四个基准数据集,并与著名的划分聚类算法K-modes及其两种改进算法进行比较,结果表明泛化中心算法聚类正确率更高,迭代次数更少,是有效可行的。  相似文献   

8.
《数理统计与管理》2015,(5):809-820
不平衡数据是指分类问题中目标变量的某一类观测值数量远大于其他类观测值数量的数据。针对处理不平衡数据算法SMOTE及其衍生算法的不足,本文提出一种新的向上采样算法SMUP(Synthetic Minority Using Proximity of Random Forests),通过样本相似度改进SMOTE算法中的距离测量方式,提高了算法的分类精度。实验结果表明,基于SMUP算法的单分类器能有效提升少数类的分类正确率,同时解决了SMOTE对定类型特征变量距离测度不佳的难题;基于SMUP算法的组合分类器分类效果也明显优于SMOTE衍生算法;最重要的是,SMUP将连续型、混合型和定类型这三种特征变量的距离测度整合到一个统一的框架下,为实际应用提供了便利。  相似文献   

9.
考虑求解一类半监督距离度量学习问题. 由于样本集(数据库)的规模与复杂性的激增, 在考虑距离度量学习问题时, 必须考虑学习来的距离度量矩阵具有稀疏性的特点. 因此, 在现有的距离度量学习模型中, 增加了学习矩阵的稀疏约束. 为了便于模型求解, 稀疏约束应用了Frobenius 范数约束. 进一步, 通过罚函数方法将Frobenius范数约束罚到目标函数, 使得具有稀疏约束的模型转化成无约束优化问题. 为了求解问题, 提出了正定矩阵群上加速投影梯度算法, 克服了矩阵群上不能直接进行线性组合的困难, 并分析了算法的收敛性. 最后通过UCI数据库的分类问题的例子, 进行了数值实验, 数值实验的结果说明了学习矩阵的稀疏性以及加速投影梯度算法的有效性.  相似文献   

10.
度量空间中的软间隔分类   总被引:1,自引:0,他引:1  
左玲 《数学杂志》2008,28(2):187-191
本文研究了度量空间中的软间隔分类问题,利用度量d的特性,得到一非线性映射,将度量空间等距嵌入Banach空间中,并构造了一种软间隔分类算法.  相似文献   

11.
Support Vector Machines (SVMs) are now very popular as a powerful method in pattern classification problems. One of main features of SVMs is to produce a separating hyperplane which maximizes the margin in feature space induced by nonlinear mapping using kernel function. As a result, SVMs can treat not only linear separation but also nonlinear separation. While the soft margin method of SVMs considers only the distance between separating hyperplane and misclassified data, we propose in this paper multi-objective programming formulation considering surplus variables. A similar formulation was extensively researched in linear discriminant analysis mostly in 1980s by using Goal Programming(GP). This paper compares these conventional methods such as SVMs and GP with our proposed formulation through several examples.Received: September 2003, Revised: December 2003,  相似文献   

12.
现有一类分类算法通常采用经典欧氏测度描述样本间相似关系,然而欧氏测度不能较好地反映一些数据集样本的内在分布结构,从而影响这些方法对数据的描述能力.提出一种用于改善一类分类器描述性能的高维空间一类数据距离测度学习算法,与已有距离测度学习算法相比,该算法只需提供目标类数据,通过引入样本先验分布正则化项和L1范数惩罚的距离测度稀疏性约束,能有效解决高维空间小样本情况下的一类数据距离测度学习问题,并通过采用分块协调下降算法高效的解决距离测度学习的优化问题.学习的距离测度能容易的嵌入到一类分类器中,仿真实验结果表明采用学习的距离测度能有效改善一类分类器的描述性能,特别能够改善SVDD的描述能力,从而使得一类分类器具有更强的推广能力.  相似文献   

13.
Support vector machines (SVMs) have attracted much attention in theoretical and in applied statistics. The main topics of recent interest are consistency, learning rates and robustness. We address the open problem whether SVMs are qualitatively robust. Our results show that SVMs are qualitatively robust for any fixed regularization parameter λ. However, under extremely mild conditions on the SVM, it turns out that SVMs are not qualitatively robust any more for any null sequence λn, which are the classical sequences needed to obtain universal consistency. This lack of qualitative robustness is of a rather theoretical nature because we show that, in any case, SVMs fulfill a finite sample qualitative robustness property.For a fixed regularization parameter, SVMs can be represented by a functional on the set of all probability measures. Qualitative robustness is proven by showing that this functional is continuous with respect to the topology generated by weak convergence of probability measures. Combined with the existence and uniqueness of SVMs, our results show that SVMs are the solutions of a well-posed mathematical problem in Hadamard’s sense.  相似文献   

14.
Support vector machines (SVMs) have been successfully used to identify individuals’ preferences in conjoint analysis. One of the challenges of using SVMs in this context is to properly control for preference heterogeneity among individuals to construct robust partworths. In this work, we present a new technique that obtains all individual utility functions simultaneously in a single optimization problem based on three objectives: complexity reduction, model fit, and heterogeneity control. While complexity reduction and model fit are dealt using SVMs, heterogeneity is controlled by shrinking the individual-level partworths toward a population mean. The proposed approach is further extended to kernel-based machines, conferring flexibility to the model by allowing nonlinear utility functions. Experiments on simulated and real-world datasets show that the proposed approach in its linear form outperforms existing methods for choice-based conjoint analysis.  相似文献   

15.
An asymptotic formula is obtained for the number of imaginary quadratic number fields with 2-class number equal to 2, from which one can then obtain a type of density result for the 2-class number. The solution of this problem leads to an interesting question about a character sum over primes.  相似文献   

16.
A semigroup is regular if it contains at least one idempotent in each ?-class and in each ?-class. A regular semigroup is inverse if it satisfies either of the following equivalent conditions: (i) there is a unique idempotent in each ?-class and in each ?-class, or (ii) the idempotents commute. Analogously, a semigroup is abundant if it contains at least one idempotent in each ?*-class and in each ?*-class. An abundant semigroup is adequate if its idempotents commute. In adequate semigroups, there is a unique idempotent in each ?* and ?*-class. M. Kambites raised the question of the converse: in a finite abundant semigroup such that there is a unique idempotent in each ?* and ?*-class, must the idempotents commute? In this note, we provide a negative answer to this question.  相似文献   

17.
This paper is concerned with the theoretical foundation of support vector machines (SVMs). The purpose is to develop further an exact relationship between SVMs and the statistical learning theory (SLT). As a representative, the standard C-support vector classification (C-SVC) is considered here. More precisely, we show that the decision function obtained by C-SVC is just one of the decision functions obtained by solving the optimization problem derived directly from the structural risk minimization principl...  相似文献   

18.
Multiclass classification and probability estimation have important applications in data analytics. Support vector machines (SVMs) have shown great success in various real-world problems due to their high classification accuracy. However, one main limitation of standard SVMs is that they do not provide class probability estimates, and thus fail to offer uncertainty measure about class prediction. In this article, we propose a simple yet effective framework to endow kernel SVMs with the feature of multiclass probability estimation. The new probability estimator does not rely on any parametric assumption on the data distribution, therefore, it is flexible and robust. Theoretically, we show that the proposed estimator is asymptotically consistent. Computationally, the new procedure can be conveniently implemented using standard SVM softwares. Our extensive numerical studies demonstrate competitive performance of the new estimator when compared with existing methods such as multiple logistic regression, linear discrimination analysis, tree-based methods, and random forest, under various classification settings. Supplementary materials for this article are available online.  相似文献   

19.
Method  In this paper, we introduce a bi-level optimization formulation for the model and feature selection problems of support vector machines (SVMs). A bi-level optimization model is proposed to select the best model, where the standard convex quadratic optimization problem of the SVM training is cast as a subproblem. Feasibility  The optimal objective value of the quadratic problem of SVMs is minimized over a feasible range of the kernel parameters at the master level of the bi-level model. Since the optimal objective value of the subproblem is a continuous function of the kernel parameters, through implicity defined over a certain region, the solution of this bi-level problem always exists. The problem of feature selection can be handled in a similar manner. Experiments and results  Two approaches for solving the bi-level problem of model and feature selection are considered as well. Experimental results show that the bi-level formulation provides a plausible tool for model selection.  相似文献   

20.
微阵列技术允许同时录制成百万的基因表达水平。但由于经费和工艺的限制,目前研究者获得的表达数据集往往包含少量的样本,而基因表达的测量值却有上万条。很多传统的统计方法无法分析这样的数据,本文结合数据挖掘中统计学习理论的相关知识,详细介绍了一种有监督分析方法———支持向量机(SVMs)在微阵列表达数据分析中的应用。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号