首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
With high-dimensional data, the number of covariates is considerably larger than the sample size. We propose a sound method for analyzing these data. It performs simultaneously clustering and variable selection. The method is inspired by the plaid model. It may be seen as a multiplicative mixture model that allows for overlapping clustering. Unlike conventional clustering, within this model an observation may be explained by several clusters. This characteristic makes it specially suitable for gene expression data. Parameter estimation is performed with the Monte Carlo expectation maximization algorithm and importance sampling. Using extensive simulations and comparisons with competing methods, we show the advantages of our methodology, in terms of both variable selection and clustering. An application of our approach to the gene expression data of kidney renal cell carcinoma taken from The Cancer Genome Atlas validates some previously identified cancer biomarkers.  相似文献   

2.
The main challenge in working with gene expression microarrays is that the sample size is small compared to the large number of variables (genes). In many studies, the main focus is on finding a small subset of the genes, which are the most important ones for differentiating between different types of cancer, for simpler and cheaper diagnostic arrays. In this paper, a sparse Bayesian variable selection method in probit model is proposed for gene selection and classification. We assign a sparse prior for regression parameters and perform variable selection by indexing the covariates of the model with a binary vector. The correlation prior for the binary vector assigned in this paper is able to distinguish models with the same size. The performance of the proposed method is demonstrated with one simulated data and two well known real data sets, and the results show that our method is comparable with other existing methods in variable selection and classification.  相似文献   

3.
We present a Ritz-Galerkin discretization on sparse grids using prewavelets, which allows us to solve elliptic differential equations with variable coefficients for dimensions d ≥ 2. The method applies multilinear finite elements. We introduce an efficient algorithm for matrix vector multiplication using a Ritz-Galerkin discretization and semi-orthogonality. This algorithm is based on standard 1-dimensional restrictions and prolongations, a simple prewavelet stencil, and the classical operator-dependent stencil for multilinear finite elements. Numerical simulation results are presented for a three-dimensional problem on a curvilinear bounded domain and for a six-dimensional problem with variable coefficients. Simulation results show a convergence of the discretization according to the approximation properties of the finite element space. The condition number of the stiffness matrix can be bounded below 10 using a standard diagonal preconditioner.  相似文献   

4.
5.
Variable selection has consistently been a hot topic in linear regression models, especially when facing with high-dimensional data. Variable ranking, an advanced form of selection, is actually more fundamental since selection can be realized by thresholding once the variables are ranked suitably. In recent years, ensemble learning has gained a significant interest in the context of variable selection due to its great potential to improve selection accuracy and to reduce the risk of falsely including some unimportant variables. Motivated by the widespread success of boosting algorithms, a novel ensemble method PBoostGA is developed in this paper to implement variable ranking and selection in linear regression models. In PBoostGA, a weight distribution is maintained over the training set and genetic algorithm is adopted as its base learner. Initially, equal weight is assigned to each instance. According to the weight updating and ensemble member generating mechanism like AdaBoost.RT, a series of slightly different importance measures are sequentially produced for each variable. Finally, the candidate variables are ordered in the light of the average importance measure and some significant variables are then selected by a thresholding rule. Both simulation results and a real data illustration show the effectiveness of PBoostGA in comparison with some existing counterparts. In particular, PBoostGA has stronger ability to exclude redundant variables.  相似文献   

6.
Advances in Data Analysis and Classification - In real-world application scenarios, the identification of groups poses a significant challenge due to possibly occurring outliers and existing noise...  相似文献   

7.
We consider the following sparse representation problem: represent a given matrix X∈ℝ m×N as a multiplication X=AS of two matrices A∈ℝ m×n (mn<N) and S∈ℝ n×N , under requirements that all m×m submatrices of A are nonsingular, and S is sparse in sense that each column of S has at least nm+1 zero elements. It is known that under some mild additional assumptions, such representation is unique, up to scaling and permutation of the rows of S. We show that finding A (which is the most difficult part of such representation) can be reduced to a hyperplane clustering problem. We present a bilinear algorithm for such clustering, which is robust to outliers. A computer simulation example is presented showing the robustness of our algorithm.  相似文献   

8.
In this paper we consider the special case where a signal x\({\in }\,\mathbb {C}^{N}\) is known to vanish outside a support interval of length m < N. If the support length m of x or a good bound of it is a-priori known we derive a sublinear deterministic algorithm to compute x from its discrete Fourier transform \(\widehat {\mathbf x}\,{\in }\,\mathbb {C}^{N}\). In case of exact Fourier measurements we require only \({\mathcal O}\)(m\(\log \)m) arithmetical operations. For noisy measurements, we propose a stable \({\mathcal O}\)(m\(\log \)N) algorithm.  相似文献   

9.
10.
Block clustering aims to reveal homogeneous block structures in a data table. Among the different approaches of block clustering, we consider here a model-based method: the Gaussian latent block model for continuous data which is an extension of the Gaussian mixture model for one-way clustering. For a given data table, several candidate models are usually examined, which differ for example in the number of clusters. Model selection then becomes a critical issue. To this end, we develop a criterion based on an approximation of the integrated classification likelihood for the Gaussian latent block model, and propose a Bayesian information criterion-like variant following the same pattern. We also propose a non-asymptotic exact criterion, thus circumventing the controversial definition of the asymptotic regime arising from the dual nature of the rows and columns in co-clustering. The experimental results show steady performances of these criteria for medium to large data tables.  相似文献   

11.
The existence of an optimal control is proved in a problem where the criterion functional and the equation of motion contain a control that is a Lebesgue-Stieltjes measure. Due to nonlinearities in the problem, it is necessary to postulate a condition implying that large atoms of the control measures are sparse and that their derivatives, wherever they exist, have a uniform bound.The author is grateful for comments received from referees, which made it possible to remove inaccuracies, improve the exposition, and enlarge the list of references. Stylistic improvements have also been suggested by Jon Strand and Knut Sydsaeter.  相似文献   

12.
In model-based cluster analysis, the expectation-maximization (EM) algorithm has a number of desirable properties, but in some situations, this algorithm can be slow to converge. Some variants are proposed to speed-up EM in reducing the time spent in the E-step, in the case of Gaussian mixture. The main aims of such methods is first to speed-up convergence of EM, and second to yield same results (or not so far) than EM itself. In this paper, we compare these methods from categorical data, with the latent class model, and we propose a new variant that sustains better results on synthetic and real data sets, in terms of convergence speed-up and number of misclassified objects.  相似文献   

13.
Transfer algorithms are usually used to optimize an objective function that is defined on the set of partitions of a finite set X. In this paper we define an equivalence relation ? on the set of fuzzy equivalence relations on X and establish a bijection from the set of hierarchies on X to the set of equivalence classes with respect to ?. Thus, hierarchies can be identified with fuzzy equivalence relations and the transfer algorithm can be modified in order to optimize an objective function that is defined on the set of hierarchies on X.  相似文献   

14.
An efficient algorithm is derived for solving the quantile regression problem combined with a group sparsity promoting penalty. The group sparsity of the regression parameters is achieved by using a \(\ell _{1,\infty }\) -norm penalty (or constraint) on the regression parameters. The algorithm is efficient in the sense that it obtains the regression parameters for a wide range of penalty parameters, thus enabling easy application of a model selection criteria afterwards. A Matlab implementation of the proposed algorithm is provided and some applications of the methods are studied.  相似文献   

15.
This paper presents a conjugate gradient method for solving systems of linear inequalities. The method is of dual optimization type and consists of two phases which can be implemented in a common framework. Phase 1 either finds the minimum-norm solution of the system or detects the inconsistency of the system. In the latter event, the method proceeds to Phase 2 in which an approximate least-squares solution to the system is obtained. The method is particularly suitable to large scale problems because it preserves the sparsity structure of the problem. Its efficiency is shown by computational comparisons with an SOR type method.  相似文献   

16.
IfA is the (sparse) coefficient matrix of linear equality constraints, for what nonsingularT isÂTA as sparse as possible, and how can it be efficiently computed? An efficient algorithm for thisSparsity Problem (SP) would be a valuable pre-processor for linearly constrained optimization problems. In this paper we develop a two-pass approach to solve SP. Pass 1 builds a combinatorial structure on the rows ofA which hierarchically decomposes them into blocks. This determines the structure of the optimal transformation matrixT. In Pass 2, we use the information aboutT as a road map to do block-wise partial Gauss-Jordan elimination onA. Two block-aggregation strategies are also suggested that could further reduce the time spend in Pass 2. Computational results indicate that this approach to increasing sparsity produces significant net reductions in simplex solution time.  相似文献   

17.
This paper considers an algorithm for finding a perfect matching, if there is one, in a bipartite graph G. It is shown that the search for a perfect matching in G may be carried on separately in the strongly connected components of appropriate directed graphs. The algorithm may be particularly useful for block triangularization of very large, sparse, nonsingular matrices.  相似文献   

18.
For the sparse signal reconstruction problem in compressive sensing, we propose a projection-type algorithm without any backtracking line search based on a new formulation of the problem. Under suitable conditions, global convergence and its linear convergence of the designed algorithm are established. The efficiency of the algorithm is illustrated through some numerical experiments on some sparse signal reconstruction problem.  相似文献   

19.
One of the most effective numerical techniques for solving nonlinear programming problems is the sequential quadratic programming approach. Many large nonlinear programming problems arise naturally in data fitting and when discretization techniques are applied to systems described by ordinary or partial differential equations. Problems of this type are characterized by matrices which are large and sparse. This paper describes a nonlinear programming algorithm which exploits the matrix sparsity produced by these applications. Numerical experience is reported for a collection of trajectory optimization problems with nonlinear equality and inequality constraints.The authors wish to acknowledge the insightful contributions of Dr. William Huffman.  相似文献   

20.
Fuzzy Optimization and Decision Making - The Evidential C-Means algorithm provides a global treatment of ambiguity and uncertainty in memberships when partitioning attribute data, but still...  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号