首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到19条相似文献,搜索用时 250 毫秒
1.
针对协同过滤推荐系统具有数据的高稀疏,高维度,数据量大的特点,本文将灰色关联聚类与协同过虑推荐算法相结合,构建了灰色关联聚类的协同过滤推荐算法,将其应用到协同过滤推荐系统中,以解决数据具有高稀疏高维度的特性情况下的个性化推荐质量问题。首先,定义了推荐系统中的用户项目评分矩阵,用户灰色绝对关联度,用户灰色相似度,用户灰色关联聚类。然后,给出了灰色关联聚类的协同过滤推荐算法的计算方法和步骤,同时给出了评价推荐质量方法。最后,将本文算法与基于余弦,相关分析及修正的余弦等协同过滤推荐算法在大小不同的数据集下进行了实验,实验表明灰色关联聚类的协同过滤推荐算法相较于传统的协同过滤推荐方法具有推荐质量高,计算量小,对数据大小要求不高等优点,同时在推荐系统的冷启动,稳定性和计算效率方面也具有一定的优势。  相似文献   

2.
由于推荐系统中存在巨量的用户和商品,现有的协同过滤方法很难处理用户-商品推荐中的数据稀疏性和计算可扩展性问题。本文提出了一种基于聚类矩阵近似的协同过滤推荐方法CF-cluMA。一方面,CF-cluMA方法通过对用户和商品进行分别聚类,并利用聚类后的用户-商品分块评分矩阵来刻画用户对于商品兴趣的局部性特点,以降低用户-商品评分矩阵的全局稀疏性。另一方面,CF-cluMA方法通过对局部稠密分块矩阵实施奇异值分解,并利用施密特变换近似全局用户-商品评分矩阵来预测用户对未知商品评分,以降低协同过滤算法的复杂性。在EachMovie电影评分真实数据集上的实验表明,相比于已有的基于矩阵近似的协同过滤推荐方法,本文所提出的CF-cluMA方法能够有效提升推荐系统的准确性并降低推荐系统的计算复杂性。本文的研究对于电子商务推荐系统具有重要的管理启示。  相似文献   

3.
半监督学习是近年来机器学习领域中的一个重要研究方向,其监督信息的质量对半监督聚类的结果影响很大,主动学习高质量的监督信息很有必要.提出一种纠错式主动学习成对约束的方法,该算法通过寻找聚类算法本身不能发现的成对约束监督信息,将其引入谱聚类算法,并利用该监督信息来调整谱聚类中点与点之间的距离矩阵.采用双向寻找的方法,将点与点间距离进行排序,使得学习器即使在接收到没有标记的数据时也能进行主动学习,实现了在较少的约束下可得到较好的聚类结果.同时,该算法降低了计算复杂度,解决了聚类过程中成对约束的奇异问题.通过在UCI基准数据集以及人工数据集的实验表明,算法的性能好于相关对比算法,并优于采用随机选取监督信息的谱聚类性能.  相似文献   

4.
苏木亚 《运筹与管理》2017,26(11):134-144
本文采用多路归一化割谱聚类方法、单变量GARCH模型和Granger因果检验相结合的模型,分阶段研究了1994-2014年间全球主要股市波动率的聚类特征。首先,利用单变量GARCH模型分别提取全球主要股市的波动率;其次,借助多路归一化割谱聚类方法的特殊性质刻画了全球主要股市波动率的聚类数目、聚类质量以及聚类结果的稳定性等特征;最后,利用Granger因果检验模型分析不同类的代表元股市间的波动溢出效应和同一类内股市间的波动溢出效应。实证结果表明,与非金融危机阶段相比,在金融危机期间全球主要股市波动率的聚类数目较多、聚类质量较高、聚类结果相对稳定、并且全球主要股市间的波动溢出效应增强。  相似文献   

5.
话题发现是网络社交平台上进行热点话题预测的一个重要研究问题。针对已有话题发现算法大多基于传统余弦相似度衡量文本数据间的相似性,无法识别各维度取值成比例变化时数据对象间的差异,文本数据相似度计算结果不准确,影响话题发现正确率的问题,提出基于双向改进余弦相似度的话题发现算法(TABOC),首先从方向和取值两个角度改进余弦相似度,提出双向改进余弦相似度,能够区分各维度取值成比例变化的数据对象,保留传统余弦相似度在方向判别上的优势,提高衡量文本相似度的准确性;进一步定义集合的双向改进余弦特征向量和双向改进余弦特征向量的加法等相关定义定理,舍弃无关信息,直接计算新合并集合的特征向量,减小话题发现过程中的时间和空间消耗;还结合增量聚类框架,高效处理新增数据。采用百度贴吧数据进行实验表明,TABOC算法进行话题发现是有效可行的,算法正确率和时间效率总体上优于其他对比算法。  相似文献   

6.
数据描述又称为一类分类方法,用于描述现有数据的分布特征,以研究待测试数据是否与该分布相吻合.首先简要叙述了基于核方法的数据描述原理,指出:选择适当的核函数以及与之对应的参数,数据描述可应用于模式聚类中,并且这种聚类方法具有边界紧致、易剔除噪声的优势.针对基于数据描述的聚类方法在确定类别数目和具体样本类别归属上所存在的问题,提出了基于搜索的解决方法,理论分析和实例计算都验证了该方法的可行性.最后将该聚类算法应用到企业关系评价中,取得了较为合理的结果.  相似文献   

7.
刘超  李元睿  谢菁 《运筹与管理》2022,31(6):147-153
在信用风险识别领域,聚类算法常被用于区分不同风险等级的样本并识别风险特征。然而该领域中通常面临高维数据处理问题,导致传统聚类算法存在不适应此类问题的缺陷:易陷入局部最优、受冗余特征干扰、鲁棒性不强等。采用高维信用风险数据,研究上市公司信用风险,建立信用风险特征识别的三目标优化模型,设计基于分解的多目标子空间聚类算法进行求解。通过算法的横向对比实验,展示了所提出的算法在聚类精度和鲁棒性方面的优势,并根据聚类算法的权重分配结果,归纳总结上市公司信用风险评估过程中应重点关注的指标。  相似文献   

8.
张璐  孔令臣  陈黄岳 《计算数学》2019,41(3):320-334
随着大数据时代的到来,各个领域涌现出海量数据且结构复杂.如变量的维数不同、尺度不同等.而现实中变量之间往往存在着不确定关系,经典的Pearson相关系数仅能反映两个同维变量间的线性相关关系,不足以完全刻画变量间的相关关系.2007年Szekely等提出的距离相关系数则能描述不同维数变量间的非线性关系.为了探索变量之间的内在信息,本文基于距离相关系数提出了最大距离相关系数法对变量聚类,且有超度量性和空间收缩性.为充分发挥距离相关系数的优势,对上述方法改进得到类整体距离相关系数法.该方法在刻画两类间相似性时,将每类中的所有变量合并成一个整体,再计算这两个不同维数的整体间的距离相关系数.最后,将类整体距离相关系数法应用到几个实际问题中,验证了算法的有效性.  相似文献   

9.
关菲  周艺  张晗 《运筹与管理》2022,31(11):9-14
协同过滤推荐算法是目前个性化推荐系统中应用比较广泛的一种算法。然而,它在处理数据稀疏性、可扩展性等方面存在一定不足。针对数据稀疏性问题,本文首先基于Slope One算法对初始的评分矩阵进行缺失值填充,其次利用基于K-means聚类的协同过滤算法预测目标用户的评分,并结合MovieLens数据集给出了相关对比实验;针对扩展性问题,本文首先提出了一种基于中心聚集参数的改进K-means算法,其次,给出了基于中心聚集参数改进K-means的协同过滤推荐算法流程,并结合MovieLens数据集设计了相关对比实验。实验结果表明,本文所提方法推荐精度均得到显著提高,数据稀疏性和扩展性问题得到了有效改善。因此,本文的研究结论不仅可进一步丰富协同过滤推荐算法的现有理论成果,还可以为提高推荐系统的精度提供理论依据和决策参考。  相似文献   

10.
基于数据流形结构的聚类方法及其应用研究   总被引:1,自引:0,他引:1  
随着信息社会的不断发展,人类已经进入了信息爆炸时代,海量的数据使数据处理变得繁琐复杂,因此如何对现有的高维数据降维、聚类,并在一定程度上消除高维数据中存在的噪声是解决该问题的关键.基于相关的理论知识采用先降维后聚类的步骤,把高维数据按照子空间结构和流形结构两种情况分类,运用稀疏子空间聚类、谱多流形聚类、K-manifolds方法进行建模求解,通过对各种方法的对比,得出谱多流形聚类方法运行速度快,聚类准确度高,是最具有一般性特征的模型.  相似文献   

11.
Cellular manufacturing is the cornerstone of many modern flexible manufacturing techniques, taking advantage of the similarities between parts in order to decrease the complexity of the design and manufacturing life cycle. Part-Machine Grouping (PMG) problem is the key step in cellular manufacturing aiming at grouping parts with similar processing requirements or similar design features into part families and by grouping machines into cells associated to these families. The PMG problem is NP-complete and the different proposed techniques for solving it are based on heuristics. In this paper, a new approach for solving the PMG problem is proposed which is based on biclustering. Biclustering is a methodology where rows and columns of an input data matrix are clustered simultaneously. A bicluster is defined as a submatrix spanned by both a subset of rows and a subset of columns. Although biclustering has been almost exclusively applied to DNA microarray analysis, we present that biclustering can be successfully applied to the PMG problem. We also present empirical results to demonstrate the efficiency and accuracy of the proposed technique with respect to related ones for various formations of the problem.  相似文献   

12.
The analysis of large-scale data sets using clustering techniques arises in many different disciplines and has important applications. Most traditional clustering techniques require heuristic methods for finding good solutions and produce suboptimal clusters as a result. In this article, we present a rigorous biclustering approach, OREO, which is based on the Optimal RE-Ordering of the rows and columns of a data matrix. The physical permutations of the rows and columns are accomplished via a network flow model according to a given objective function. This optimal re-ordering model is used in an iterative framework where cluster boundaries in one dimension are used to partition and re-order the other dimensions of the corresponding submatrices. The performance of OREO is demonstrated on metabolite concentration data to validate the ability of the proposed method and compare it to existing clustering methods.  相似文献   

13.
Clustering and classification are important tasks for the analysis of microarray gene expression data. Classification of tissue samples can be a valuable diagnostic tool for diseases such as cancer. Clustering samples or experiments may lead to the discovery of subclasses of diseases. Clustering genes can help identify groups of genes that respond similarly to a set of experimental conditions. We also need validation tools for clustering and classification. Here, we focus on the identification of outliers—units that may have been misallocated, or mislabeled, or are not representative of the classes or clusters.We present two new methods: DDclust and DDclass, for clustering and classification. These non-parametric methods are based on the intuitively simple concept of data depth. We apply the methods to several gene expression and simulated data sets. We also discuss a convenient visualization and validation tool—the relative data depth plot.  相似文献   

14.
Clustering is one of the most widely used procedures in the analysis of microarray data, for example with the goal of discovering cancer subtypes based on observed heterogeneity of genetic marks between different tissues. It is well known that in such high-dimensional settings, the existence of many noise variables can overwhelm the few signals embedded in the high-dimensional space. We propose a novel Bayesian approach based on Dirichlet process with a sparsity prior that simultaneous performs variable selection and clustering, and also discover variables that only distinguish a subset of the cluster components. Unlike previous Bayesian formulations, we use Dirichlet process (DP) for both clustering of samples as well as for regularizing the high-dimensional mean/variance structure. To solve the computational challenge brought by this double usage of DP, we propose to make use of a sequential sampling scheme embedded within Markov chain Monte Carlo (MCMC) updates to improve the naive implementation of existing algorithms for DP mixture models. Our method is demonstrated on a simulation study and illustrated with the leukemia gene expression dataset.  相似文献   

15.
The theory of Gaussian graphical models is a powerful tool for independence analysis between continuous variables. In this framework, various methods have been conceived to infer independence relations from data samples. However, most of them result in stepwise, deterministic, descent algorithms that are inadequate for solving this issue. More recent developments have focused on stochastic procedures, yet they all base their research on strong a priori knowledge and are unable to perform model selection among the set of all possible models. Moreover, convergence of the corresponding algorithms is slow, precluding applications on a large scale. In this paper, we propose a novel Bayesian strategy to deal with structure learning. Relating graphs to their supports, we convert the problem of model selection into that of parameter estimation. Use of non-informative priors and asymptotic results yield a posterior probability for independence graph supports in closed form. Gibbs sampling is then applied to approximate the full joint posterior density. We finally give three examples of structure learning, one from synthetic data, and the two others from real data.  相似文献   

16.
In this paper, we use the Fuzzy C-means method for clustering 3-way gene expression data via optimization of multiple objectives. A reformulation of the total clustering criterion is used to obtain an expression which has fewer variables compared to the classical FCM criterion. This transformation allows the use of a direct global optimizer in constrast to the alternating search commonly used. Gene expression data from microarray technology is generally of high dimension. The problem of empty space is known for this kind of data. We propose in this paper a transformation allowing more contrast in distances between all pairs of data samples. This, hence, increases the likelihood of detecting group structure, if any, in high dimensional datasets.  相似文献   

17.
We consider the task of simultaneously clustering the rows and columns of a large transposable data matrix. We assume that the matrix elements are normally distributed with a bicluster-specific mean term and a common variance, and perform biclustering by maximizing the corresponding log-likelihood. We apply an ?1 penalty to the means of the biclusters to obtain sparse and interpretable biclusters. Our proposal amounts to a sparse, symmetrized version of k-means clustering. We show that k-means clustering of the rows and of the columns of a data matrix can be seen as special cases of our proposal, and that a relaxation of our proposal yields the singular value decomposition. In addition, we propose a framework for biclustering based on the matrix-variate normal distribution. The performances of our proposals are demonstrated in a simulation study and on a gene expression dataset. This article has supplementary material online.  相似文献   

18.
We formulate a discrete optimization problem that leads to a simple and informative derivation of a widely used class of spectral clustering algorithms. Regarding the algorithms as attempting to bi-partition a weighted graph with N vertices, our derivation indicates that they are inherently tuned to tolerate all partitions into two non-empty sets, independently of the cardinality of the two sets. This approach also helps to explain the difference in behaviour observed between methods based on the unnormalized and normalized graph Laplacian. We also give a direct explanation of why Laplacian eigenvectors beyond the Fiedler vector may contain fine-detail information of relevance to clustering. We show numerical results on synthetic data to support the analysis. Further, we provide examples where normalized and unnormalized spectral clustering is applied to microarray data—here the graph summarizes similarity of gene activity across different tissue samples, and accurate clustering of samples is a key task in bioinformatics.  相似文献   

19.
利用基因表达数据提出一种新的网络模型—贝叶斯网络,发现基因的互作.一个贝叶斯网络是多变量联合概率分布的有向图模型,表示变量间的条件独立属性.首先我们阐明贝叶斯网络如何表示基因间的互作,然后介绍从基因芯片数据学习贝叶斯网络的方法.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号