首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Diverse reduct subspaces based co-training for partially labeled data   总被引:1,自引:0,他引:1  
Rough set theory is an effective supervised learning model for labeled data. However, it is often the case that practical problems involve both labeled and unlabeled data, which is outside the realm of traditional rough set theory. In this paper, the problem of attribute reduction for partially labeled data is first studied. With a new definition of discernibility matrix, a Markov blanket based heuristic algorithm is put forward to compute the optimal reduct of partially labeled data. A novel rough co-training model is then proposed, which could capitalize on the unlabeled data to improve the performance of rough classifier learned only from few labeled data. The model employs two diverse reducts of partially labeled data to train its base classifiers on the labeled data, and then makes the base classifiers learn from each other on the unlabeled data iteratively. The classifiers constructed in different reduct subspaces could benefit from their diversity on the unlabeled data and significantly improve the performance of the rough co-training model. Finally, the rough co-training model is theoretically analyzed, and the upper bound on its performance improvement is given. The experimental results show that the proposed model outperforms other representative models in terms of accuracy and even compares favorably with rough classifier trained on all training data labeled.  相似文献   

2.
Semi-supervised learning is an emerging computational paradigm for machine learning,that aims to make better use of large amounts of inexpensive unlabeled data to improve the learning performance.While various methods have been proposed based on different intuitions,the crucial issue of generalization performance is still poorly understood.In this paper,we investigate the convergence property of the Laplacian regularized least squares regression,a semi-supervised learning algorithm based on manifold regularization.Moreover,the improvement of error bounds in terms of the number of labeled and unlabeled data is presented for the first time as far as we know.The convergence rate depends on the approximation property and the capacity of the reproducing kernel Hilbert space measured by covering numbers.Some new techniques are exploited for the analysis since an extra regularizer is introduced.  相似文献   

3.
半监督学习算法用到标记和未标记的样本.大量的实验表明,利用无标记样本可以改进学习算法的逼近性能.然而,当样本数增加时,逼近性能的定量分析几乎没有.本文构造基于扩散矩阵的一种半监督学习算法,建立逼近阶.结果还量化地说明,未标记样本的使用可以减少逼近误差.  相似文献   

4.
Semi-supervised learning has been of growing interest over the past few years and many methods have been proposed. Although various algorithms are provided to implement semi-supervised learning,there are still gaps in our understanding of the dependence of generalization error on the numbers of labeled and unlabeled data. In this paper,we consider a graph-based semi-supervised classification algorithm and establish its generalization error bounds. Our results show the close relations between the generalizat...  相似文献   

5.
A classification method, which comprises Fuzzy C-Means method, a modified form of the Huang-index function and Variable Precision Rough Set (VPRS) theory, is proposed for classifying labeled/unlabeled data sets in this study. This proposed method, designated as the MVPRS-index method, is used to partition the values of per conditional attribute within the data set and to achieve both the optimal number of clusters and the optimal accuracy of VPRS classification. The validity of the proposed approach is confirmed by comparing the classification results obtained from the MVPRS-index method for UCI data sets and a typical stock market data set with those obtained from the supervised neural networks classification method. Overall, the results show that the MVPRS-index method could be applied to data sets not only with labeled information but also with unlabeled information, and therefore provides a more reliable basis for the extraction of decision-making rules of labeled/unlabeled datasets.  相似文献   

6.
When there are multiple trained predictors, one may want to integrate them into one predictor. However, this is challenging if the performances of the trained predictors are unknown and labeled data for evaluating their performances are not given. In this paper, a method is described that uses unlabeled data to estimate the weight parameters needed to build an ensemble predictor integrating multiple trained component predictors. It is readily derived from a mathematical model of ensemble learning based on a generalized mixture of probability density functions and corresponding information divergence measures. Numerical experiments demonstrated that the performance of our method is much better than that of simple average-based ensemble learning, even when the assumption placed on the performances of the component predictors does not hold exactly.  相似文献   

7.
It has been reported that using unlabeled data together with labeled data to construct a discriminant function works successfully in practice. However, theoretical studies have implied that unlabeled data can sometimes adversely affect the performance of discriminant functions. Therefore, it is important to know what situations call for the use of unlabeled data. In this paper, asymptotic relative efficiency is presented as the measure for comparing analyses with and without unlabeled data under the heteroscedastic normality assumption. The linear discriminant function maximizing the area under the receiver operating characteristic curve is considered. Asymptotic relative efficiency is evaluated to investigate when and how unlabeled data contribute to improving discriminant performance under several conditions. The results show that asymptotic relative efficiency depends mainly on the heteroscedasticity of the covariance matrices and the stochastic structure of observing the labels of the cases.  相似文献   

8.
Methods are developed for finding the number of unlabeled bridgeless or 2-line-connected graphs of any order. These methods are based on cycle index sums, but it is shown how to avoid explicit compution with cycle index sums by using suitable inversion techniques. Similar results are obtained for unlabeled bridgeless graphs by numbers of points and lines, and connected graphs by numbers of points and bridges. Corresponding results for labeled graphs are found as corollaries. When lines or bridges are required as enumeration parameters in the labeled case it is also shown how to obtain improved recurrence relations. The latter appear to have no analog for unlabeled graphs.  相似文献   

9.
The elastic net (supervised enet henceforth) is a popular and computationally efficient approach for performing the simultaneous tasks of selecting variables, decorrelation, and shrinking the coefficient vector in the linear regression setting. Semisupervised regression, currently unrelated to the supervised enet, uses data with missing response values (unlabeled) along with labeled data to train the estimator. In this article, we propose the joint trained elastic net (jt-enet), which elegantly incorporates the benefits of semisupervised regression with the supervised enet. The supervised enet and other approaches like it rely on shrinking the linear estimator in a way that simultaneously performs variable selection and decorrelates the data. Both the variable selection and decorrelation components of the supervised enet inherently rely on the pairwise correlation structure in the feature data. In circumstances in which the number of variables is high, the feature data are relatively easy to obtain, and the response is expensive to generate, it seems reasonable that one would want to be able to use any existing unlabeled observations to more accurately define these correlations. However, the supervised enet is not able to incorporate this information and focuses only on the information within the labeled data. In this article, we propose the jt-enet, which allows the unlabeled data to influence the variable selection, decorrelation, and shrinkage capabilities of the linear estimator. In addition, we investigate the impact of unlabeled data on the risk and bias of the proposed estimator. The jt-enet is demonstrated on two applications with encouraging results. Online supplementary material is available for this article.  相似文献   

10.
In this paper we propose and analyse a choice of parameters in the multi-parameter regularization of Tikhonov type. A modified discrepancy principle is presented within the multi-parameter regularization framework. An order optimal error bound is obtained under the standard smoothness assumptions. We also propose a numerical realization of the multi-parameter discrepancy principle based on the model function approximation. Numerical experiments on a series of test problems support theoretical results. Finally we show how the proposed approach can be successfully implemented in Laplacian Regularized Least Squares for learning from labeled and unlabeled examples.  相似文献   

11.
Summary. We define directed rooted labeled and unlabeled trees and find measures on the space of directed rooted unlabeled trees which are invariant with respect to transition probabilities corresponding to a biased random walk on a directed rooted labeled tree. We use these to calculate the speed of a biased random walk on directed rooted labeled trees. The results are mainly applied to directed trees with recurrent subtrees, where the random walker cannot escape. Received: 12 March 1997/ In revised form: 11 December 1997  相似文献   

12.
The correct values for the number of all unlabeled lattices on n elements are known for . We present a fast orderly algorithm generating all unlabeled lattices up to a given size n. Using this algorithm, we have computed the number of all unlabeled lattices as well as that of all labeled lattices on an n-element set for each . Received April 4, 2000; accepted in final form November 2, 2001. RID="h1" ID="h1" Presented by R. Freese.  相似文献   

13.
This paper presents a method that creates instructionally sound learning experiences by means of learning objects. The method uses a mathematical model, distinguishes two kinds of Learning Objects Properties and proceeds in two major steps: first, the Course Creation is transformed into Set Covering under specific requirements derived from Learning Theories and practice; second, the Alternative Learning Sources are selected by using a similarity measure specially defined for this purpose.  相似文献   

14.
A difference graph is a bipartite graph G = (X, Y; E) such that all the neighborhoods of the vertices of X are comparable by inclusion. We enumerate labeled and unlabeled difference graphs with or without a bipartition of the vertices into two stable sets. The labeled enumerations are expressed in terms of combinatorial numbers related to the Stirling numbers of the second kind.  相似文献   

15.
For labeled trees, Rényi showed that the probability that an arbitrary point of a random tree has degree k approaches l/e(k?l)!. For unlabeled trees, the answer is different because the number of ways to label a given tree depends on the order of its automorphism group. Using arguments involving combinatorial enumeration and asymptotics, we evaluate the corresponding probabilities for large unlabeled trees.  相似文献   

16.
Foundations of Computational Mathematics - Learning mappings of data on manifolds is an important topic in contemporary machine learning, with applications in astrophysics, geophysics, statistical...  相似文献   

17.
In this paper we answer to the comments provided by Fabio Cozman, Marco Zaffalon, Giorgio Corani, and Didier Dubois on our paper ‘Imprecise Probability Models for Learning Multinomial Distributions from Data. Applications to Learning Credal Networks’. The main topics we have considered are: regularity, the learning principle, the trade-off between prior imprecision and learning, strong symmetry, and the properties of ISSDM for learning graphical conditional independence models.  相似文献   

18.
It is shown that unlabeled planar graphs can be encoded using 12n bits, and an asymptotically optimal representation is given for labeled planar graphs.  相似文献   

19.
We develop a supervised dimension reduction method that integrates the idea of localization from manifold learning with the sliced inverse regression framework. We call our method localized sliced inverse regression (LSIR) since it takes into account the local structure of the explanatory variables. The resulting projection from LSIR is a linear subspace of the explanatory variables that captures the nonlinear structure relevant to predicting the response. LSIR applies to both classification and regression problems and can be easily extended to incorporate the ancillary unlabeled data in semi-supervised learning. We illustrate the utility of LSIR on real and simulated data. Computer codes and datasets from simulations are available online.  相似文献   

20.
An ordered set-partition (or preferential arrangement) of n labeled elements represents a single “hierarchy” these are enumerated by the ordered Bell numbers. In this note we determine the number of “hierarchical orderings” or “societies”, where the n elements are first partitioned into mn subsets and a hierarchy is specified for each subset. We also consider the unlabeled case, where the ordered Bell numbers are replaced by the composition numbers. If there is only a single hierarchy, we show that the average rank of an element is asymptotic to n/(4 log 2) in the labeled case and to n/4 in the unlabeled case. This revised version was published online in September 2006 with corrections to the Cover Date.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号