首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
We discuss the problem of estimating the number of principal components in principal components analysis (PCA). Despite the importance of the problem and the multitude of solutions proposed in literature, it comes as a surprise that there does not exist a coherent asymptotic framework, which would justify different approaches depending on the actual size of the dataset. In this article, we address this issue by presenting an approximate Bayesian approach based on Laplace approximation and introducing a general method of developing criteria for model selection, called PEnalized SEmi-integrated Likelihood (PESEL). Our general framework encompasses a variety of existing approaches based on probabilistic models, like the Bayesian Information Criterion for Probabilistic PCA (PPCA), and enables the construction of new criteria, depending on the size of the dataset at hand and additional prior information. Specifically, we apply PESEL to derive two new criteria for datasets where the number of variables substantially exceeds the number of observations, which is out of the scope of currently existing approaches. We also report results of extensive simulation studies and real data analysis, which illustrate the desirable properties of our proposed criteria as compared to state-of-the-art methods and very recent proposals. Specifically, these simulations show that PESEL-based criteria can be quite robust against deviations from the assumptions of a probabilistic model. Selected PESEL-based criteria for the estimation of the number of principal components are implemented in the R package pesel, which is available on github (https://github.com/psobczyk/pesel). Supplementary material for this article, with additional simulation results, is available online. The code to reproduce all simulations is available at https://github.com/psobczyk/pesel_simulations.  相似文献   

2.
We develop a new estimator of the inverse covariance matrix for high-dimensional multivariate normal data using the horseshoe prior. The proposed graphical horseshoe estimator has attractive properties compared to other popular estimators, such as the graphical lasso and the graphical smoothly clipped absolute deviation. The most prominent benefit is that when the true inverse covariance matrix is sparse, the graphical horseshoe provides estimates with small information divergence from the sampling model. The posterior mean under the graphical horseshoe prior can also be almost unbiased under certain conditions. In addition to these theoretical results, we also provide a full Gibbs sampler for implementing our estimator. MATLAB code is available for download from github at http://github.com/liyf1988/GHS. The graphical horseshoe estimator compares favorably to existing techniques in simulations and in a human gene network data analysis. Supplementary materials for this article are available online.  相似文献   

3.
A computationally simple approach to inference in state space models is proposed, using approximate Bayesian computation (ABC). ABC avoids evaluation of an intractable likelihood by matching summary statistics for the observed data with statistics computed from data simulated from the true process, based on parameter draws from the prior. Draws that produce a “match” between observed and simulated summaries are retained, and used to estimate the inaccessible posterior. With no reduction to a low-dimensional set ofsufficient statistics being possible in the state space setting, we define the summaries as the maximum of an auxiliary likelihood function, and thereby exploit the asymptotic sufficiency of this estimator for the auxiliary parameter vector. We derive conditions under which this approach—including a computationally efficient version based on the auxiliary score—achieves Bayesian consistency. To reduce the well-documented inaccuracy of ABC in multiparameter settings, we propose the separate treatment of each parameter dimension using an integrated likelihood technique. Three stochastic volatility models for which exact Bayesian inference is either computationally challenging, or infeasible, are used for illustration. We demonstrate that our approach compares favorably against an extensive set of approximate and exact comparators. An empirical illustration completes the article. Supplementary materials for this article are available online.  相似文献   

4.
5.
Genetic variation forms the basis for diversity but can as well be harmful and cause diseases, such as tumors. Structural variants (SV) are an example of complex genetic variations that comprise of many nucleotides ranging up to several megabases. Based on recent developments in sequencing technology it has become feasable to elucidate the genetic state of a person’s genes (i.e. the exome) or even the complete genome. Here, a machine learning approach is presented to find small disease-related SVs with the help of sequencing data. The method uses differences in characteristics of mapping patterns between tumor and normal samples at a genomic locus. This way, the method aims to be directly applicable for exome sequencing data to improve detection of SVs since specific SV detection methods are currently lacking. The method has been evaluated based on a simulation study as well as with exome data of patients with acute myeloid leukemia. An implementation of the algorithm is available at https://github.com/lenz99-/svmod.  相似文献   

6.
I propose a framework for the linear prediction of a multiway array (i.e., a tensor) from another multiway array of arbitrary dimension, using the contracted tensor product. This framework generalizes several existing approaches, including methods to predict a scalar outcome from a tensor, a matrix from a matrix, or a tensor from a scalar. I describe an approach that exploits the multiway structure of both the predictors and the outcomes by restricting the coefficients to have reduced PARAFAC/CANDECOMP rank. I propose a general and efficient algorithm for penalized least-squares estimation, which allows for a ridge (L2) penalty on the coefficients. The objective is shown to give the mode of a Bayesian posterior, which motivates a Gibbs sampling algorithm for inference. I illustrate the approach with an application to facial image data. An R package is available at https://github.com/lockEF/MultiwayRegression.  相似文献   

7.
We compare alternative computing strategies for solving the constrained lasso problem. As its name suggests, the constrained lasso extends the widely used lasso to handle linear constraints, which allow the user to incorporate prior information into the model. In addition to quadratic programming, we employ the alternating direction method of multipliers (ADMM) and also derive an efficient solution path algorithm. Through both simulations and benchmark data examples, we compare the different algorithms and provide practical recommendations in terms of efficiency and accuracy for various sizes of data. We also show that, for an arbitrary penalty matrix, the generalized lasso can be transformed to a constrained lasso, while the converse is not true. Thus, our methods can also be used for estimating a generalized lasso, which has wide-ranging applications. Code for implementing the algorithms is freely available in both the Matlab toolbox SparseReg and the Julia package ConstrainedLasso. Supplementary materials for this article are available online.  相似文献   

8.
We propose and analyze an asynchronously parallel optimization algorithm for finding multiple, high-quality minima of nonlinear optimization problems. Our multistart algorithm considers all previously evaluated points when determining where to start or continue a local optimization run. Theoretical results show that when there are finitely many minima, the algorithm almost surely starts a finite number of local optimization runs and identifies every minimum. The algorithm is applicable to general optimization settings, but our numerical results focus on the case when derivatives are unavailable. In numerical tests, a Python implementation of the algorithm is shown to yield good approximations of many minima (including a global minimum), and this ability is shown to scale well with additional resources. Our implementation’s time to solution is shown also to scale well even when the time to perform the function evaluation is highly variable. An implementation of the algorithm is available in the libEnsemble library at https://github.com/Libensemble/libensemble.  相似文献   

9.
We propose a novel Markov chain Monte-Carlo (MCMC) method for reverse engineering the topological structure of stochastic reaction networks, a notoriously challenging problem that is relevant in many modern areas of research, like discovering gene regulatory networks or analyzing epidemic spread. The method relies on projecting the original time series trajectories, from the stochastic data generating process, onto information rich summary statistics and constructing the appropriate synthetic likelihood function to estimate reaction rates. The resulting estimates are consistent in the large volume limit and are obtained without employing complicated tuning strategies and expensive resampling as typically used by likelihood-free MCMC and approximate Bayesian methods. To illustrate the method, we apply it in two real data examples: the molecular pathway analysis with RNA-seq and the famous incidence data from 1665 plague outbreak at Eyam, England.  相似文献   

10.
With new treatments and novel technology available, precision medicine has become a key topic in the new era of healthcare. Traditional statistical methods for precision medicine focus on subgroup discovery through identifying interactions between a few markers and treatment regimes. However, given the large scale and high dimensionality of modern datasets, it is difficult to detect the interactions between treatment and high-dimensional covariates. Recently, novel approaches have emerged that seek to directly estimate individualized treatment rules (ITR) via maximizing the expected clinical reward by using, for example, support vector machines (SVM) or decision trees. The latter enjoys great popularity in clinical practice due to its interpretability. In this article, we propose a new reward function and a novel decision tree algorithm to directly maximize rewards. We further improve a single tree decision rule by an ensemble decision tree algorithm, ITR random forests. Our final decision rule is an average over single decision trees and it is a soft probability rather than a hard choice. Depending on how strong the treatment recommendation is, physicians can make decisions based on our model along with their own judgment and experience. Performance of ITR forest and tree methods is assessed through simulations along with applications to a randomized controlled trial (RCT) of 1385 patients with diabetes and an EMR cohort of 5177 patients with diabetes. ITR forest and tree methods are implemented using statistical software R (https://github.com/kdoub5ha/ITR.Forest). Supplementary materials for this article are available online.  相似文献   

11.
The authors define strongly Gauduchon spaces and the class■■, which are generalization of strongly Gauduchon manifolds in complex spaces. Comparing with the case of Kahlerian, the strongly Gauduchon space and the class■are similar to the Kahler space and the Fujiki class■■ respectively. Some properties about these complex spaces are obtained. Moreover, the relations between the strongly Gauduchon spaces and the class■■ are studied.  相似文献   

12.
In this work we show that if is a linear differential operator of order ν with smooth complex coefficients in from a complex vector space E to a complex vector space F, the Sobolev a priori estimate holds locally at any point if and only if is elliptic and the constant coefficient homogeneous operator is canceling in the sense of Van Schaftingen for every which means that Here is the homogeneous part of order ν of and is the principal symbol of . This result implies and unifies the proofs of several estimates for complexes and pseudo‐complexes of operators of order one or higher proved recently by other methods as well as it extends —in the local setup— the characterization of Van Schaftingen to operators with variable coefficients.  相似文献   

13.
Interactive decision making arose as a means to overcome the uncertainty concerning the decision maker's (DM) value function. So far the search is confined to nondominated alternatives, which assumes that a win–lose strategy is adopted. The purpose of this paper is to suggest a complementary interactive algorithm that uses an interior point method to solve multiple objective linear programming problems. As the algorithm proceeds, the DM has access to intermediate solutions. The sequence of intermediate solutions has a very interesting characteristic: all of the criteria are improved, that is, a solution Open image in new window , that follows another solution Open image in new window , has the values of all objectives greater than those of Open image in new window . This WIN-WIN feature allows the DM to reach a nondominated solution without making any trade-off among the objective functions. However, there is no impediment in proceeding with traditional multiobjective methods.  相似文献   

14.
Bayesian networks are graphical tools used to represent a high-dimensional probability distribution. They are used frequently in machine learning and many applications such as medical science. This paper studies whether the concept classes induced by a Bayesian network can be embedded into a low-dimensional inner product space. We focus on two-label classification tasks over the Boolean domain. For full Bayesian networks and almost full Bayesian networks with n variables, we show that VC dimension and the minimum dimension of the inner product space induced by them are 2n-1. Also, for each Bayesian network we show that if the network constructed from by removing Xn satisfies either (i) is a full Bayesian network with n-1 variables, i is the number of parents of Xn, and i<n-1 or (ii) is an almost full Bayesian network, the set of all parents of Xn PAn={X1,X2,Xn3,…,Xni} and 2i<n-1. Our results in the paper are useful in evaluating the VC dimension and the minimum dimension of the inner product space of concept classes induced by other Bayesian networks.  相似文献   

15.
In this article we introduce a new random mapping model, , which maps the set {1,2,…,n} into itself. The random mapping is constructed using a collection of exchangeable random variables which satisfy . In the random digraph, , which represents the mapping , the in‐degree sequence for the vertices is given by the variables , and, in some sense, can be viewed as an analogue of the general independent degree models from random graph theory. We show that the distribution of the number of cyclic points, the number of components, and the size of a typical component can be expressed in terms of expectations of various functions of . We also consider two special examples of which correspond to random mappings with preferential and anti‐preferential attachment, respectively, and determine, for these examples, exact and asymptotic distributions for the statistics mentioned above. © 2007 Wiley Periodicals, Inc. Random Struct. Alg., 2008  相似文献   

16.
It is known that applying an ‐homothetic deformation to a complex contact manifold whose vertical space is annihilated by the curvature yields a condition which is invariant under ‐homothetic deformations. A complex contact manifold satisfying this condition is said to be a complex ‐space. In this paper, we deal with the questions of Bochner, conformal and conharmonic flatness of complex ‐spaces when , and prove that such kind of spaces cannot be Bochner flat, conformally flat or conharmonically flat.  相似文献   

17.
Consider a graph of minimum degree δ and order n. Its total vertex irregularity strength is the smallest integer k for which one can find a weighting such that for every pair of vertices of G. We prove that the total vertex irregularity strength of graphs with is bounded from above by . One of the cornerstones of the proof is a random ordering of the vertices generated by order statistics.  相似文献   

18.
Consider the focusing $\dot H^{1/2}$ ‐critical semilinear Schrödinger equation in $\font\open=msbm10 at 10pt\def\R{\hbox{\open R}}\R^3$ It admits an eight‐dimensional manifold of special solutions called ground state solitons. We exhibit a codimension‐1 critical real analytic manifold ${\cal N}$ of asymptotically stable solutions of (0.1) in a neighborhood of the soliton manifold. We then show that ${\cal N}$ is center‐stable, in the dynamical systems sense of Bates and Jones, and globally‐in‐time invariant. Solutions in ${\cal N}$ are asymptotically stable and separate into two asymptotically free parts that decouple in the limit—a soliton and radiation. Conversely, in a general setting, any solution that stays $\dot H^{1/2}$ ‐close to the soliton manifold for all time is in ${\cal N}$ . The proof uses the method of modulation. New elements include a different linearization and an endpoint Strichartz estimate for the time‐dependent linearized equation. The proof also uses the fact that the linearized Hamiltonian has no nonzero real eigenvalues or resonances. This has recently been established in the case treated here—of the focusing cubic NLS in $\font\open=msbm10 at 10pt\def\R{\hbox{\open R}}\R^3$ —by the work of Marzuola and Simpson and Costin, Huang, and Schlag. © 2012 Wiley Periodicals, Inc.  相似文献   

19.
Let consist of all simple graphs on 2k vertices and edges. For a simple graph G and a positive integer , let denote the number of proper vertex colorings of G in at most colors, and let . We prove that and is the only extremal graph. We also prove that as . © 2007 Wiley Periodicals, Inc. J Graph Theory 56: 135–148, 2007  相似文献   

20.
Given a joint probability density function of N real random variables, , obtained from the eigenvector–eigenvalue decomposition of N × N random matrices, one constructs a random variable, the linear statistics, defined by the sum of smooth functions evaluated at the eigenvalues or singular values of the random matrix, namely, . For the joint PDFs obtained from the Gaussian and Laguerre ensembles, we compute, in this paper, the moment‐generating function , where denotes expectation value over the orthogonal (β = 1) and symplectic (β = 4) ensembles, in the form one plus a Schwartz function, none vanishing over for the Gaussian ensembles and for the Laguerre ensembles. These are ultimately expressed in the form of the determinants of identity plus a scalar operator, from which we obtained the large N asymptotic of the linear statistics from suitably scaled F(·). Copyright © 2016 John Wiley & Sons, Ltd.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号