首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 33 毫秒
1.
We prove a theorem concerning the approximation of generalized bandlimited mul-tivariate functions by deep ReLU networks for which the curse of the dimensionality is overcome.Our theorem is based on a result by Maurey and on the ability of deep ReLU networks to approximate Chebyshev polynomials and analytic functions efficiently.  相似文献   

2.
Deep neural network with rectified linear units (ReLU) is getting more and more popular recently. However, the derivatives of the function represented by a ReLU network are not continuous, which limit the usage of ReLU network to situations only when smoothness is not required. In this paper, we construct deep neural networks with rectified power units (RePU), which can give better approximations for smooth functions. Optimal algorithms are proposed to explicitly build neural networks with sparsely connected RePUs, which we call PowerNets, to represent polynomials with no approximation error. For general smooth functions, we first project the function to their polynomial approximations, then use the proposed algorithms to construct corresponding PowerNets. Thus, the error of best polynomial approximation provides an upper bound of the best RePU network approximation error. For smooth functions in higher dimensional Sobolev spaces, we use fast spectral transforms for tensor-product grid and sparse grid discretization to get polynomial approximations. Our constructive algorithms show clearly a close connection between spectral methods and deep neural networks: PowerNets with $n$ hidden layers can exactly represent polynomials up to degree $s^n$, where $s$ is the power of RePUs. The proposed PowerNets have potential applications in the situations where high-accuracy is desired or smoothness is required.  相似文献   

3.
In this paper, we investigate the relationship between deep neural networks (DNN) with rectified linear unit (ReLU) function as the activation function and continuous piecewise linear (CPWL) functions, especially CPWL functions from the simplicial linear finite element method (FEM). We first consider the special case of FEM. By exploring the DNN representation of its nodal basis functions, we present a ReLU DNN representation of CPWL in FEM. We theoretically establish that at least $2$ hidden layers are needed in a ReLU DNN to represent any linear finite element functions in $\Omega \subseteq \mathbb{R}^d$ when $d\ge2$. Consequently, for $d=2,3$ which are often encountered in scientific and engineering computing, the minimal number of two hidden layers are necessary and sufficient for any CPWL function to be represented by a ReLU DNN. Then we include a detailed account on how a general CPWL in $\mathbb R^d$ can be represented by a ReLU DNN with at most $\lceil\log_2(d+1)\rceil$ hidden layers and we also give an estimation of the number of neurons in DNN that are needed in such a representation. Furthermore, using the relationship between DNN and FEM, we theoretically argue that a special class of DNN models with low bit-width are still expected to have an adequate representation power in applications. Finally, as a proof of concept, we present some numerical results for using ReLU DNNs to solve a two-point boundary problem to demonstrate the potential of applying DNN for numerical solution of partial differential equations.  相似文献   

4.

We analyze the topological properties of the set of functions that can be implemented by neural networks of a fixed size. Surprisingly, this set has many undesirable properties. It is highly non-convex, except possibly for a few exotic activation functions. Moreover, the set is not closed with respect to \(L^p\)-norms, \(0< p < \infty \), for all practically used activation functions, and also not closed with respect to the \(L^\infty \)-norm for all practically used activation functions except for the ReLU and the parametric ReLU. Finally, the function that maps a family of weights to the function computed by the associated network is not inverse stable for every practically used activation function. In other words, if \(f_1, f_2\) are two functions realized by neural networks and if \(f_1, f_2\) are close in the sense that \(\Vert f_1 - f_2\Vert _{L^\infty } \le \varepsilon \) for \(\varepsilon > 0\), it is, regardless of the size of \(\varepsilon \), usually not possible to find weights \(w_1, w_2\) close together such that each \(f_i\) is realized by a neural network with weights \(w_i\). Overall, our findings identify potential causes for issues in the training procedure of deep learning such as no guaranteed convergence, explosion of parameters, and slow convergence.

  相似文献   

5.
Constructing neural networks for function approximation is a classical and longstanding topic in approximation theory. In this paper, we aim at constructing deep neural networks with three hidden layers using a sigmoidal activation function to approximate smooth and sparse functions. Specifically, we prove that the constructed deep nets with controllable magnitude of free parameters can reach the optimal approximation rate in approximating both smooth and sparse functions. In particular, we prove that neural networks with three hidden layers can avoid the phenomenon of saturation, i.e., the phenomenon that for some neural network architectures, the approximation rate stops improving for functions of very high smoothness.  相似文献   

6.
Deep neural networks have successfully been trained in various application areas with stochastic gradient descent. However, there exists no rigorous mathematical explanation why this works so well. The training of neural networks with stochastic gradient descent has four different discretization parameters: (i) the network architecture; (ii) the amount of training data; (iii) the number of gradient steps; and (iv) the number of randomly initialized gradient trajectories. While it can be shown that the approximation error converges to zero if all four parameters are sent to infinity in the right order, we demonstrate in this paper that stochastic gradient descent fails to converge for ReLU networks if their depth is much larger than their width and the number of random initializations does not increase to infinity fast enough.  相似文献   

7.
8.
随着分段线性函数的广泛应用,本文尝试研究浅层和深层的分段线性神经网络的逼近理论.作者将应用于三层感知机模型的万能逼近定理拓展到分段线性神经网络中,并给出与隐藏神经元个数相关的逼近误差估计.利用分段线性函数构造锯齿函数的显式方法,证明解析函数可以通过分段线性神经网络的深度堆叠以指数速率逼近,并辅以相应的数值实验.  相似文献   

9.
We study a class of deep neural networks with architectures that form a directed acyclic graph(DAG).For backpropagation defined by gradient descent with adaptive momentum,we show weights converge for a large class of nonlinear activation functions.'The proof generalizes the results of Wu et al.(2008)who showed convergence for a feed-forward network with one hidden layer.For an example of the effectiveness of DAG architectures,we describe an example of compression through an AutoEncoder,and compare against sequential feed-forward networks under several metrics.  相似文献   

10.

Characterising intractable high-dimensional random variables is one of the fundamental challenges in stochastic computation. The recent surge of transport maps offers a mathematical foundation and new insights for tackling this challenge by coupling intractable random variables with tractable reference random variables. This paper generalises the functional tensor-train approximation of the inverse Rosenblatt transport recently developed by Dolgov et al. (Stat Comput 30:603–625, 2020) to a wide class of high-dimensional non-negative functions, such as unnormalised probability density functions. First, we extend the inverse Rosenblatt transform to enable the transport to general reference measures other than the uniform measure. We develop an efficient procedure to compute this transport from a squared tensor-train decomposition which preserves the monotonicity. More crucially, we integrate the proposed order-preserving functional tensor-train transport into a nested variable transformation framework inspired by the layered structure of deep neural networks. The resulting deep inverse Rosenblatt transport significantly expands the capability of tensor approximations and transport maps to random variables with complicated nonlinear interactions and concentrated density functions. We demonstrate the efficiency of the proposed approach on a range of applications in statistical learning and uncertainty quantification, including parameter estimation for dynamical systems and inverse problems constrained by partial differential equations.

  相似文献   

11.
12.
The goal of this paper is to characterize function distributions that general neural networks trained by descent algorithms (GD/SGD), can or cannot learn in polytime. The results are: (1) The paradigm of general neural networks trained by SGD is poly-time universal: any function distribution that can be learned from samples in polytime can also be learned by a poly-size neural net trained by SGD with polynomial parameters. In particular, this can be achieved despite polynomial noise on the gradients, implying a separation result between SGD-based deep learning and statistical query algorithms, as the latter are not comparably universal due to cases like parities. This also shows that deep learning does not suffer from the limitations of shallow networks. (2) The paper further gives a lower-bound on the generalization error of descent algorithms, which relies on two quantities: the cross-predictability, an average-case quantity related to the statistical dimension, and the null-flow, a quantity specific to descent algorithms. The lower-bound implies in particular that for functions of low enough cross-predictability, the above robust universality breaks down once the gradients are averaged over too many samples (as in perfect GD) rather than fewer (as in SGD). (3) Finally, it is shown that if larger amounts of noise are added on the initialization and on the gradients, then SGD is no longer comparably universal due again to distributions having low enough cross-predictability.  相似文献   

13.
More competent learning models are demanded for data processing due to increasingly greater amounts of data available in applications. Data that we encounter often have certain embedded sparsity structures. That is, if they are represented in an appropriate basis, their energies can concentrate on a small number of basis functions. This paper is devoted to a numerical study of adaptive approximation of solutions of nonlinear partial differential equations whose solutions may have singularities, by deep neural networks (DNNs) with a sparse regularization with multiple parameters. Noting that DNNs have an intrinsic multi-scale structure which is favorable for adaptive representation of functions, by employing a penalty with multiple parameters, we develop DNNs with a multi-scale sparse regularization (SDNN) for effectively representing functions having certain singularities. We then apply the proposed SDNN to numerical solutions of the Burgers equation and the Schrödinger equation. Numerical examples confirm that solutions generated by the proposed SDNN are sparse and accurate.  相似文献   

14.
Abstract. Four-layer feedforward regular fuzzy neural networks are constructed. Universal ap-proximations to some continuous fuzzy functions defined on (R)“ by the four-layer fuzzyneural networks are shown. At first,multivariate Bernstein polynomials associated with fuzzyvalued functions are empolyed to approximate continuous fuzzy valued functions defined on eachcompact set of R“. Secondly,by introducing cut-preserving fuzzy mapping,the equivalent condi-tions for continuous fuzzy functions that can be arbitrarily closely approximated by regular fuzzyneural networks are shown. Finally a few of sufficient and necessary conditions for characteriz-ing approximation capabilities of regular fuzzy neural networks are obtained. And some concretefuzzy functions demonstrate our conclusions.  相似文献   

15.
Recent years have witnessed growing interests in solving partial differential equations by deep neural networks, especially in the high-dimensional case. Unlike classical numerical methods, such as finite difference method and finite element method, the enforcement of boundary conditions in deep neural networks is highly nontrivial. One general strategy is to use the penalty method. In the work, we conduct a comparison study for elliptic problems with four different boundary conditions, i.e., Dirichlet, Neumann, Robin, and periodic boundary conditions, using two representative methods: deep Galerkin method and deep Ritz method. In the former, the PDE residual is minimized in the least-squares sense while the corresponding variational problem is minimized in the latter. Therefore, it is reasonably expected that deep Galerkin method works better for smooth solutions while deep Ritz method works better for low-regularity solutions. However, by a number of examples, we observe that deep Ritz method can outperform deep Galerkin method with a clear dependence of dimensionality even for smooth solutions and deep Galerkin method can also outperform deep Ritz method for low-regularity solutions.Besides, in some cases, when the boundary condition can be implemented in an exact manner, we find that such a strategy not only provides a better approximate solution but also facilitates the training process.  相似文献   

16.
朱石焕  吴曦 《数学季刊》2002,17(4):94-98
小波神经网络是近年来发展起来的一种逼近非线性函数的新型人工神经网络。特别是,正交尺度函数为某函数的小波神经网络更适合于函数逼近。本文在此基础上讨论了小波神经网络对非线性AR(p)过程的逼近。  相似文献   

17.
In this paper, we consider the problem of approximation of continuous multivariate functions by neural networks with a bounded number of neurons in hidden layers. We prove the existence of single-hidden-layer networks with bounded number of neurons, which have approximation capabilities not worse than those of networks with arbitrarily many neurons. Our analysis is based on the properties of ridge functions.  相似文献   

18.
In this paper, we introduce a type of approximation operators of neural networks with sigmodal functions on compact intervals, and obtain the pointwise and uniform estimates of the ap- proximation. To improve the approximation rate, we further introduce a type of combinations of neurM networks. Moreover, we show that the derivatives of functions can also be simultaneously approximated by the derivatives of the combinations. We also apply our method to construct approximation operators of neural networks with sigmodal functions on infinite intervals.  相似文献   

19.
Functional optimization problems can be solved analytically only if special assumptions are verified; otherwise, approximations are needed. The approximate method that we propose is based on two steps. First, the decision functions are constrained to take on the structure of linear combinations of basis functions containing free parameters to be optimized (hence, this step can be considered as an extension to the Ritz method, for which fixed basis functions are used). Then, the functional optimization problem can be approximated by nonlinear programming problems. Linear combinations of basis functions are called approximating networks when they benefit from suitable density properties. We term such networks nonlinear (linear) approximating networks if their basis functions contain (do not contain) free parameters. For certain classes of d-variable functions to be approximated, nonlinear approximating networks may require a number of parameters increasing moderately with d, whereas linear approximating networks may be ruled out by the curse of dimensionality. Since the cost functions of the resulting nonlinear programming problems include complex averaging operations, we minimize such functions by stochastic approximation algorithms. As important special cases, we consider stochastic optimal control and estimation problems. Numerical examples show the effectiveness of the method in solving optimization problems stated in high-dimensional settings, involving for instance several tens of state variables.  相似文献   

20.
周期性、反周期性和概周期性是时变神经网络的重要动态行为特性.本文在不将所研究的神经网络分解为实值系统的情况下,根据重合度理论中的延拓定理和不等式技巧,通过构造不同于现有平衡点稳定性研究的李雅普诺夫函数,研究了一类具有变时滞的惯性四元Hopfield神经网络的反周期解的动力学问题,给出了上述神经网络反周期解存在的一个新的判别条件.并通过构造李雅普诺夫函数论证了上述神经网络反周期解的指数稳定性.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号