首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
It is desirable to combine the expressive power of deep learning with Gaussian Process (GP) in one expressive Bayesian learning model. Deep kernel learning showed success as a deep network used for feature extraction. Then, a GP was used as the function model. Recently, it was suggested that, albeit training with marginal likelihood, the deterministic nature of a feature extractor might lead to overfitting, and replacement with a Bayesian network seemed to cure it. Here, we propose the conditional deep Gaussian process (DGP) in which the intermediate GPs in hierarchical composition are supported by the hyperdata and the exposed GP remains zero mean. Motivated by the inducing points in sparse GP, the hyperdata also play the role of function supports, but are hyperparameters rather than random variables. It follows our previous moment matching approach to approximate the marginal prior for conditional DGP with a GP carrying an effective kernel. Thus, as in empirical Bayes, the hyperdata are learned by optimizing the approximate marginal likelihood which implicitly depends on the hyperdata via the kernel. We show the equivalence with the deep kernel learning in the limit of dense hyperdata in latent space. However, the conditional DGP and the corresponding approximate inference enjoy the benefit of being more Bayesian than deep kernel learning. Preliminary extrapolation results demonstrate expressive power from the depth of hierarchy by exploiting the exact covariance and hyperdata learning, in comparison with GP kernel composition, DGP variational inference and deep kernel learning. We also address the non-Gaussian aspect of our model as well as way of upgrading to a full Bayes inference.  相似文献   

2.
In the context of Markov processes, we show a new scheme to derive dual processes and a duality function based on a boson representation. This scheme is applicable to a case in which a generator is expressed by boson creation and annihilation operators. For some stochastic processes, duality relations have been known, which connect continuous time Markov processes with discrete state space and those with continuous state space. We clarify that using a generating function approach and the Doi-Peliti method, a birth-death process (or discrete random walk model) is naturally connected to a differential equation with continuous variables, which would be interpreted as a dual Markov process. The key point in the derivation is to use bosonic coherent states as a bra state, instead of a conventional projection state. As examples, we apply the scheme to a simple birth-coagulation process and a Brownian momentum process. The generator of the Brownian momentum process is written by elements of the SU(1,1) algebra, and using a boson realization of SU(1,1) we show that the same scheme is available.  相似文献   

3.
Maximum entropy network ensembles have been very successful in modelling sparse network topologies and in solving challenging inference problems. However the sparse maximum entropy network models proposed so far have fixed number of nodes and are typically not exchangeable. Here we consider hierarchical models for exchangeable networks in the sparse limit, i.e., with the total number of links scaling linearly with the total number of nodes. The approach is grand canonical, i.e., the number of nodes of the network is not fixed a priori: it is finite but can be arbitrarily large. In this way the grand canonical network ensembles circumvent the difficulties in treating infinite sparse exchangeable networks which according to the Aldous-Hoover theorem must vanish. The approach can treat networks with given degree distribution or networks with given distribution of latent variables. When only a subgraph induced by a subset of nodes is known, this model allows a Bayesian estimation of the network size and the degree sequence (or the sequence of latent variables) of the entire network which can be used for network reconstruction.  相似文献   

4.
This work describes and validates an approach for autonomously bifurcating turbulent combustion manifolds to divide regression tasks amongst specialized artificial neural networks (ANNs). This approach relies on the mixture of experts (MoE) framework, where each neural network is trained to be specialized in a given portion of the input space. The assignment of different input regions to the experts is determined by a gating network, which is a neural network classifier. In some previous studies [1], [2], [3], [4], it has been demonstrated that bifurcation of a complex combustion manifold and fitting different ANNs for each part leads to better fits or faster inference speeds. However, the manner of bifurcation in these studies was based on heuristic approaches or clustering techniques. In contrast, the proposed technique enables automatic bifurcation using non-linear planes in high-dimensional turbulent combustion manifolds that are often associated with complex behavior due to different dominating physics in various zones. The proposed concept is validated using 4-dimensional (4D) and 5D flamelet tables, showing that the errors obtained with a given network size, or conversely the network size required to achieve a given accuracy, is considerably reduced. The effect of the number of experts on inference speed is also investigated, showing that by increasing the number of experts from 1 to 8, the inference time can be approximately reduced by a factor of two. Moreover, it is shown that the MoE approach divides the input manifold in a physically intuitive manner, suggesting that the MoE framework can elucidate high-dimensional datasets in a physically meaningful way.  相似文献   

5.
Recent advances in statistical inference have significantly expanded the toolbox of probabilistic modeling. Historically, probabilistic modeling has been constrained to very restricted model classes, where exact or approximate probabilistic inference is feasible. However, developments in variational inference, a general form of approximate probabilistic inference that originated in statistical physics, have enabled probabilistic modeling to overcome these limitations: (i) Approximate probabilistic inference is now possible over a broad class of probabilistic models containing a large number of parameters, and (ii) scalable inference methods based on stochastic gradient descent and distributed computing engines allow probabilistic modeling to be applied to massive data sets. One important practical consequence of these advances is the possibility to include deep neural networks within probabilistic models, thereby capturing complex non-linear stochastic relationships between the random variables. These advances, in conjunction with the release of novel probabilistic modeling toolboxes, have greatly expanded the scope of applications of probabilistic models, and allowed the models to take advantage of the recent strides made by the deep learning community. In this paper, we provide an overview of the main concepts, methods, and tools needed to use deep neural networks within a probabilistic modeling framework.  相似文献   

6.
针对火电厂烟气光谱数据的非线性特性,采用了基于神经网络内部模型的非线性偏最小二乘定量分析方法。该方法进行偏最小二乘(PLS)回归后,将自变量和因变量的隐变量作为神经网络的输入和输出进行训练,即可得到非线性内部模型。将PLS、基于向后传递神经网络内部模型的非线性PLS(BP-NPLS)、基于径向基函数神经网络内部模型的非线性PLS(RBF-NPLS)和基于自适应模糊推理系统内部模型的非线性PLS(ANFIS-NPLS)对火电厂烟气多组分进行测定后比较,BP-NPLS、RBF-NPLS和ANFIS-NPLS较之PLS,将二氧化硫预测模型的预测均方根误差(RMSEP)分别降低了16.96%,16.60%和19.55%;将一氧化氮预测模型的RMSEP分别降低了8.60%,8.47%和10.09%;将二氧化氮预测模型的RMSEP分别降低了2.11%,3.91%和3.97%。实验表明,非线性PLS较PLS更适用于火电厂烟气定量分析。通过神经网络对非线性函数的高度逼近特性,基于本文所提及内部模型的非线性偏最小二乘方法有较好的预测能力和稳健性,在一定程度上解决了基于多项式和样条函数等其他内部模型的非线性偏最小二乘方法的自身局限性。其中,ANFIS-NPLS的效果最好,自适应模糊推理系统的学习能力能够有效降低残差,使模型具有较好的泛化性,是一种比较准确实用的火电厂烟气定量分析方法。  相似文献   

7.
A method is described for the automatic recognition of transient animal sounds. Automatic recognition can be used in wild animal research, including studies of behavior, population, and impact of anthropogenic noise. The method described here, spectrogram correlation, is well-suited to recognition of animal sounds consisting of tones and frequency sweeps. For a sound type of interest, a two-dimensional synthetic kernel is constructed and cross-correlated with a spectrogram of a recording, producing a recognition function--the likelihood at each point in time that the sound type was present. A threshold is applied to this function to obtain discrete detection events, instants at which the sound type of interest was likely to be present. An extension of this method handles the temporal variation commonly present in animal sounds. Spectrogram correlation was compared to three other methods that have been used for automatic call recognition: matched filters, neural networks, and hidden Markov models. The test data set consisted of bowhead whale (Balaena mysticetus) end notes from songs recorded in Alaska in 1986 and 1988. The method had a success rate of about 97.5% on this problem, and the comparison indicated that it could be especially useful for detecting a call type when relatively few (5-200) instances of the call type are known.  相似文献   

8.
In this paper, we present an efficient opinion control strategy for complex networks, in particular, for social networks. The proposed adaptive bridge control (ABC) strategy calls for controlling a special kind of nodes named bridge and requires no knowledge of the node degrees or any other global or local knowledge, which are necessary for some other immunization strategies including targeted immunization and acquaintance immunization. We study the efficiency of the proposed ABC strategy on random networks, small-world networks, scale-free networks, and the random networks adjusted by the edge exchanging method. Our results show that the proposed ABC strategy is efficient for all of these four kinds of networks. Through an adjusting clustering coefficient by the edge exchanging method, it is found out that the efficiency of our ABC strategy is closely related with the clustering coefficient. The main contributions of this paper can be listed as follows: (1) A new high-order social network is proposed to describe opinion dynamic. (2) An algorithm, which does not require the knowledge of the nodes' degree and other global∕local network structure information, is proposed to control the "bridges" more accurately and further control the opinion dynamics of the social networks. The efficiency of our ABC strategy is illustrated by numerical examples. (3) The numerical results indicate that our ABC strategy is more efficient for networks with higher clustering coefficient.  相似文献   

9.
In this paper, we propose a model adaptation algorithm based on maximum likelihood subband polynomial regression (MLSPR) for robust speech recognition. In this algorithm, the cepstral mean vectors of prior trained hidden Markov models (HMMs) are converted to the log-spectral domain by the inverse discrete cosine transform (DCT) and each log-spectral mean vector is divided into several subband vectors. The relationship between the training and testing subband vectors is approximated by a polynomial function. The polynomial coefficients are estimated from adaptation data using the expectation–maximization (EM) algorithm under the maximum likelihood (ML) criterion. The experimental results show that the proposed MLSPR algorithm is superior to both the maximum likelihood linear regression (MLLR) adaptation and maximum likelihood subband weighting (MLSW) approach. In the MLSPR adaptation, only a very small amount of adaptation data is required and therefore it is more useful for fast model adaptation.  相似文献   

10.
A load-sharing system is defined as a parallel system whose load will be redistributed to its surviving components as each of the components fails in the system. Our focus is on making statistical inference of the parameters associated with the lifetime distribution of each component in the system. In this paper, we introduce a methodology which integrates the conventional procedure under the assumption of the load-sharing system being made up of fundamental hypothetical latent random variables. We then develop an expectation maximization algorithm for performing the maximum likelihood estimation of the system with Lindley-distributed component lifetimes. We adopt several standard simulation techniques to compare the performance of the proposed methodology with the Newton–Raphson-type algorithm for the maximum likelihood estimate of the parameter. Numerical results indicate that the proposed method is more effective by consistently reaching a global maximum.  相似文献   

11.
薏仁种类的近红外光谱技术快速鉴别   总被引:1,自引:0,他引:1  
薏仁是一种药食两用资源,对其品质快速鉴别的需求也越来越多,近红外光谱技术(near infrared spectroscopy,NIRS)作为一种快速、 无损且环保的方法正适合这一需求。 以不同产地和品种薏仁的近红外光谱为基础,结合化学计量学方法对薏仁种类进行鉴别。 对原光谱用无监督学习算法主成分分析(principal component analysis,PCA)和有监督学习算法学习向量量化(learning vector quantization,LVQ)神经网络、 支持向量机(support vector machine,SVM)进行定性判别分析。 由于不同地区和不同品种的薏仁营养物质组成复杂且含量相近,所选两类薏仁的特征变量很相似,因而PCA得分图重叠严重,很难区分;而LVQ神经网络和SVM都能得到满意结果,LVQ神经网络的预测正确率为90.91%,SVM在经过惩罚参数和核函数参数优选后,分类准确率能达到100%。 结果表明:近红外光谱技术结合化学计量学方法可作为一种快速、 无损、 可靠的方法用于薏仁种类的鉴别,并为市场规范提供技术参考。  相似文献   

12.
Recent theoretical work on the modeling of network structure has focused primarily on networks that are static and unchanging, but many real-world networks change their structure over time. There exist natural generalizations to the dynamic case of many static network models, including the classic random graph, the configuration model, and the stochastic block model, where one assumes that the appearance and disappearance of edges are governed by continuous-time Markov processes with rate parameters that can depend on properties of the nodes. Here we give an introduction to this class of models, showing for instance how one can compute their equilibrium properties. We also demonstrate their use in data analysis and statistical inference, giving efficient algorithms for fitting them to observed network data using the method of maximum likelihood. This allows us, for example, to estimate the time constants of network evolution or infer community structure from temporal network data using cues embedded both in the probabilities over time that node pairs are connected by edges and in the characteristic dynamics of edge appearance and disappearance. We illustrate these methods with a selection of applications, both to computer-generated test networks and real-world examples.  相似文献   

13.
The maximum correntropy Kalman filter (MCKF) is an effective algorithm that was proposed to solve the non-Gaussian filtering problem for linear systems. Compared with the original Kalman filter (KF), the MCKF is a sub-optimal filter with Gaussian correntropy objective function, which has been demonstrated to have excellent robustness to non-Gaussian noise. However, the performance of MCKF is affected by its kernel bandwidth parameter, and a constant kernel bandwidth may lead to severe accuracy degradation in non-stationary noises. In order to solve this problem, the mixture correntropy method is further explored in this work, and an improved maximum mixture correntropy KF (IMMCKF) is proposed. By derivation, the random variables that obey Beta-Bernoulli distribution are taken as intermediate parameters, and a new hierarchical Gaussian state-space model was established. Finally, the unknown mixing probability and state estimation vector at each moment are inferred via a variational Bayesian approach, which provides an effective solution to improve the applicability of MCKFs in non-stationary noises. Performance evaluations demonstrate that the proposed filter significantly improves the existing MCKFs in non-stationary noises.  相似文献   

14.
A novel hard decision decoding scheme based on a hybrid intelligent algorithm combining genetic algorithm and neural network, named as genetic neural-network decoding (GND), is proposed. GND offsets the reliability loss caused by channel transmission error and hard decision quantization by making full use of the genetic algorithm's optimization capacity and neural network's pattern classification function to optimize the hard decision outputs of received matched filter and restore a more likelihood codeword as the input of hard decision decoder. As can be seen from the theoretical analysis and computer simulation, GND scheme is close to the traditional soft decision decoding in error correction performance, while its complexity, compared with the traditional soft decision decoding, is greatly reduced because its decoding process does not need to use the channel statistical information.  相似文献   

15.
We present a new approach to Bayesian inference that entirely avoids Markov chain simulation, by constructing a map that pushes forward the prior measure to the posterior measure. Existence and uniqueness of a suitable measure-preserving map is established by formulating the problem in the context of optimal transport theory. We discuss various means of explicitly parameterizing the map and computing it efficiently through solution of an optimization problem, exploiting gradient information from the forward model when possible. The resulting algorithm overcomes many of the computational bottlenecks associated with Markov chain Monte Carlo. Advantages of a map-based representation of the posterior include analytical expressions for posterior moments and the ability to generate arbitrary numbers of independent posterior samples without additional likelihood evaluations or forward solves. The optimization approach also provides clear convergence criteria for posterior approximation and facilitates model selection through automatic evaluation of the marginal likelihood. We demonstrate the accuracy and efficiency of the approach on nonlinear inverse problems of varying dimension, involving the inference of parameters appearing in ordinary and partial differential equations.  相似文献   

16.
提出一种稀疏降噪自编码结合高斯过程的近红外光谱药品鉴别方法。首先对近红外光谱数据进行小波变换以消除基线漂移,然后用稀疏降噪自编码(SDAE)网络提取光谱特征并降维表示,最后采用高斯过程(GP)进行二分类,其中GP选用光谱混合(SM)核函数作为协方差函数,记此分类网络为wSDAGSM。自编码网络具有很强的模型表示能力,高斯过程分类器在处理小样本数据时具有优势。wSDAGSM网络通过稀疏降噪自编码学习得到维数更低但更有价值的特征来表示输入数据,同时将具有很好表达力的光谱混合核作为高斯过程的协方差函数,有利于更准确的光谱数据分类。以琥乙红霉素及其他药品的近红外光谱为实验数据,将该方法与经过墨西哥帽小波变换的BP神经网络(wBP)、支持向量机(wSVM), SDAE结合Logistic二分类(wSDAL)、SDAE结合采用平方指数(SE)协方差核的GP二分类(wSDAGSE),以及未采用小波变换的SDAGSM网络等方法进行对比。实验结果表明,对光谱数据进行墨西哥帽小波变换预处理能有效提升SDAGSM网络的分类准确率和稳定性。wSDAGSM方法无论从分类准确率还是分类结果稳定性方面,都优于其他分类器。  相似文献   

17.
Li Jun-hua  Li Ming 《Optik》2013,124(24):6780-6785
Random noise perturbs objective functions in many practical problems, and genetic algorithms (GAs) have been widely proposed as an effective optimization tool for dealing with noisy objective functions. However, little papers for convergence and convergence speed of genetic algorithms in noisy environments (GA-NE) have been published. In this paper, a Markov chain that models elitist genetic algorithms in noisy environments (EGA-NE) was constructed under the circumstance that objective function is perturbed only by additive random noise, and it was proved to be an absorbing state Markov chain. The convergence of EGA-NE was proved on the basis of the character of the absorbing state Markov chain, its convergence rate was analyzed, and its upper and lower bounds for the iteration number expectation were derived when EGA-NE first gets a globally optimal solution.  相似文献   

18.
When applied to classification problems, Bayesian networks are often used to infer a class variable when given feature variables. Earlier reports have described that the classification accuracy of Bayesian network structures achieved by maximizing the marginal likelihood (ML) is lower than that achieved by maximizing the conditional log likelihood (CLL) of a class variable given the feature variables. Nevertheless, because ML has asymptotic consistency, the performance of Bayesian network structures achieved by maximizing ML is not necessarily worse than that achieved by maximizing CLL for large data. However, the error of learning structures by maximizing the ML becomes much larger for small sample sizes. That large error degrades the classification accuracy. As a method to resolve this shortcoming, model averaging has been proposed to marginalize the class variable posterior over all structures. However, the posterior standard error of each structure in the model averaging becomes large as the sample size becomes small; it subsequently degrades the classification accuracy. The main idea of this study is to improve the classification accuracy using subbagging, which is modified bagging using random sampling without replacement, to reduce the posterior standard error of each structure in model averaging. Moreover, to guarantee asymptotic consistency, we use the K-best method with the ML score. The experimentally obtained results demonstrate that our proposed method provides more accurate classification than earlier BNC methods and the other state-of-the-art ensemble methods do.  相似文献   

19.
Graphical models for statistical inference and data assimilation   总被引:1,自引:0,他引:1  
In data assimilation for a system which evolves in time, one combines past and current observations with a model of the dynamics of the system, in order to improve the simulation of the system as well as any future predictions about it. From a statistical point of view, this process can be regarded as estimating many random variables which are related both spatially and temporally: given observations of some of these variables, typically corresponding to times past, we require estimates of several others, typically corresponding to future times.

Graphical models have emerged as an effective formalism for assisting in these types of inference tasks, particularly for large numbers of random variables. Graphical models provide a means of representing dependency structure among the variables, and can provide both intuition and efficiency in estimation and other inference computations. We provide an overview and introduction to graphical models, and describe how they can be used to represent statistical dependency and how the resulting structure can be used to organize computation. The relation between statistical inference using graphical models and optimal sequential estimation algorithms such as Kalman filtering is discussed. We then give several additional examples of how graphical models can be applied to climate dynamics, specifically estimation using multi-resolution models of large-scale data sets such as satellite imagery, and learning hidden Markov models to capture rainfall patterns in space and time.  相似文献   


20.
Deep learning has proven to be an important element of modern data processing technology, which has found its application in many areas such as multimodal sensor data processing and understanding, data generation and anomaly detection. While the use of deep learning is booming in many real-world tasks, the internal processes of how it draws results is still uncertain. Understanding the data processing pathways within a deep neural network is important for transparency and better resource utilisation. In this paper, a method utilising information theoretic measures is used to reveal the typical learning patterns of convolutional neural networks, which are commonly used for image processing tasks. For this purpose, training samples, true labels and estimated labels are considered to be random variables. The mutual information and conditional entropy between these variables are then studied using information theoretical measures. This paper shows that more convolutional layers in the network improve its learning and unnecessarily higher numbers of convolutional layers do not improve the learning any further. The number of convolutional layers that need to be added to a neural network to gain the desired learning level can be determined with the help of theoretic information quantities including entropy, inequality and mutual information among the inputs to the network. The kernel size of convolutional layers only affects the learning speed of the network. This study also shows that where the dropout layer is applied to has no significant effects on the learning of networks with a lower dropout rate, and it is better placed immediately after the last convolutional layer with higher dropout rates.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号