期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

DEEP RELU NETWORKS OVERCOME THE CURSE OF DIMENSIONALITY FOR GENERALIZED BANDLIMITED FUNCTIONS

Hadrien Montanelli Haizhao Yang Qiang Du 《计算数学(英文版)》2021,39(6):801-815

We prove a theorem concerning the approximation of generalized bandlimited mul-tivariate functions by deep ReLU networks for which the curse of the dimensionality is overcome.Our theorem is based on a result by Maurey and on the ability of deep ReLU networks to approximate Chebyshev polynomials and analytic functions efficiently. 相似文献

2.

PowerNet: Efficient Representations of Polynomials and Smooth Functions by Deep Neural Networks with Rectified Power Units

下载免费PDF全文

Bo Li Shanshan Tang & Haijun Yu 《数学研究》2020,53(2):159-191

Deep neural network with rectified linear units (ReLU) is getting more and more popular recently. However, the derivatives of the function represented by a ReLU network are not continuous, which limit the usage of ReLU network to situations only when smoothness is not required. In this paper, we construct deep neural networks with rectified power units (RePU), which can give better approximations for smooth functions. Optimal algorithms are proposed to explicitly build neural networks with sparsely connected RePUs, which we call PowerNets, to represent polynomials with no approximation error. For general smooth functions, we first project the function to their polynomial approximations, then use the proposed algorithms to construct corresponding PowerNets. Thus, the error of best polynomial approximation provides an upper bound of the best RePU network approximation error. For smooth functions in higher dimensional Sobolev spaces, we use fast spectral transforms for tensor-product grid and sparse grid discretization to get polynomial approximations. Our constructive algorithms show clearly a close connection between spectral methods and deep neural networks: PowerNets with $n$ hidden layers can exactly represent polynomials up to degree $s^n$, where $s$ is the power of RePUs. The proposed PowerNets have potential applications in the situations where high-accuracy is desired or smoothness is required. 相似文献

3.

RELU DEEP NEURAL NETWORKS AND LINEAR FINITE ELEMENTS

Juncai He Lin Li Jinchao Xu & Chunyue Zheng 《计算数学(英文版)》2020,38(3):502-527

In this paper, we investigate the relationship between deep neural networks (DNN) with rectified linear unit (ReLU) function as the activation function and continuous piecewise linear (CPWL) functions, especially CPWL functions from the simplicial linear finite element method (FEM). We first consider the special case of FEM. By exploring the DNN representation of its nodal basis functions, we present a ReLU DNN representation of CPWL in FEM. We theoretically establish that at least $2$ hidden layers are needed in a ReLU DNN to represent any linear finite element functions in $\Omega \subseteq \mathbb{R}^d$ when $d\ge2$. Consequently, for $d=2,3$ which are often encountered in scientific and engineering computing, the minimal number of two hidden layers are necessary and sufficient for any CPWL function to be represented by a ReLU DNN. Then we include a detailed account on how a general CPWL in $\mathbb R^d$ can be represented by a ReLU DNN with at most $\lceil\log_2(d+1)\rceil$ hidden layers and we also give an estimation of the number of neurons in DNN that are needed in such a representation. Furthermore, using the relationship between DNN and FEM, we theoretically argue that a special class of DNN models with low bit-width are still expected to have an adequate representation power in applications. Finally, as a proof of concept, we present some numerical results for using ReLU DNNs to solve a two-point boundary problem to demonstrate the potential of applying DNN for numerical solution of partial differential equations. 相似文献

4.

Topological Properties of the Set of Functions Generated by Neural Networks of Fixed Size

Petersen Philipp Raslan Mones Voigtlaender Felix 《Foundations of Computational Mathematics》2021,21(2):375-444

We analyze the topological properties of the set of functions that can be implemented by neural networks of a fixed size. Surprisingly, this set has many undesirable properties. It is highly non-convex, except possibly for a few exotic activation functions. Moreover, the set is not closed with respect to $L^p$-norms, $0< p < \infty $, for all practically used activation functions, and also not closed with respect to the $L^\infty $-norm for all practically used activation functions except for the ReLU and the parametric ReLU. Finally, the function that maps a family of weights to the function computed by the associated network is not inverse stable for every practically used activation function. In other words, if $f_1, f_2$ are two functions realized by neural networks and if $f_1, f_2$ are close in the sense that $\Vert f_1 - f_2\Vert _{L^\infty } \le \varepsilon $ for $\varepsilon > 0$, it is, regardless of the size of $\varepsilon $, usually not possible to find weights $w_1, w_2$ close together such that each $f_i$ is realized by a neural network with weights $w_i$. Overall, our findings identify potential causes for issues in the training procedure of deep learning such as no guaranteed convergence, explosion of parameters, and slow convergence.

相似文献

5.

Approximating smooth and sparse functions by deep neural networks: Optimal approximation rates and saturation

《Journal of Complexity》2023

Constructing neural networks for function approximation is a classical and longstanding topic in approximation theory. In this paper, we aim at constructing deep neural networks with three hidden layers using a sigmoidal activation function to approximate smooth and sparse functions. Specifically, we prove that the constructed deep nets with controllable magnitude of free parameters can reach the optimal approximation rate in approximating both smooth and sparse functions. In particular, we prove that neural networks with three hidden layers can avoid the phenomenon of saturation, i.e., the phenomenon that for some neural network architectures, the approximation rate stops improving for functions of very high smoothness. 相似文献

6.

Non-convergence of stochastic gradient descent in the training of deep neural networks

《Journal of Complexity》2021

Deep neural networks have successfully been trained in various application areas with stochastic gradient descent. However, there exists no rigorous mathematical explanation why this works so well. The training of neural networks with stochastic gradient descent has four different discretization parameters: (i) the network architecture; (ii) the amount of training data; (iii) the number of gradient steps; and (iv) the number of randomly initialized gradient trajectories. While it can be shown that the approximation error converges to zero if all four parameters are sent to infinity in the right order, we demonstrate in this paper that stochastic gradient descent fails to converge for ReLU networks if their depth is much larger than their width and the number of random initializations does not increase to infinity fast enough. 相似文献

7.

Provable approximation properties for deep neural networks

Uri Shaham Alexander Cloninger Ronald R. Coifman 《Applied and Computational Harmonic Analysis》2018,44(3):537-557

相似文献

8.

分段线性神经网络的逼近理论

下载免费PDF全文

吴心宇陈天平卢文联《数学年刊A辑(中文版)》2024,(1):53-70

随着分段线性函数的广泛应用,本文尝试研究浅层和深层的分段线性神经网络的逼近理论.作者将应用于三层感知机模型的万能逼近定理拓展到分段线性神经网络中,并给出与隐藏神经元个数相关的逼近误差估计.利用分段线性函数构造锯齿函数的显式方法,证明解析函数可以通过分段线性神经网络的深度堆叠以指数速率逼近,并辅以相应的数值实验. 相似文献

9.

CONVERGENCE OF BACKPROPAGATION WITH MOMENTUM FOR NETWORK ARCHITECTURES WITH SKIP CONNECTIONS

Chirag Agarwal Joe Klobusicky Dan Schonfeld 《计算数学(英文版)》2021,39(1):147-158

We study a class of deep neural networks with architectures that form a directed acyclic graph(DAG).For backpropagation defined by gradient descent with adaptive momentum,we show weights converge for a large class of nonlinear activation functions.'The proof generalizes the results of Wu et al.(2008)who showed convergence for a feed-forward network with one hidden layer.For an example of the effectiveness of DAG architectures,we describe an example of compression through an AutoEncoder,and compare against sequential feed-forward networks under several metrics. 相似文献

10.

Deep Composition of Tensor-Trains Using Squared Inverse Rosenblatt Transports

Cui Tiangang Dolgov Sergey 《Foundations of Computational Mathematics》2022,22(6):1863-1922

Characterising intractable high-dimensional random variables is one of the fundamental challenges in stochastic computation. The recent surge of transport maps offers a mathematical foundation and new insights for tackling this challenge by coupling intractable random variables with tractable reference random variables. This paper generalises the functional tensor-train approximation of the inverse Rosenblatt transport recently developed by Dolgov et al. (Stat Comput 30:603–625, 2020) to a wide class of high-dimensional non-negative functions, such as unnormalised probability density functions. First, we extend the inverse Rosenblatt transform to enable the transport to general reference measures other than the uniform measure. We develop an efficient procedure to compute this transport from a squared tensor-train decomposition which preserves the monotonicity. More crucially, we integrate the proposed order-preserving functional tensor-train transport into a nested variable transformation framework inspired by the layered structure of deep neural networks. The resulting deep inverse Rosenblatt transport significantly expands the capability of tensor approximations and transport maps to random variables with complicated nonlinear interactions and concentrated density functions. We demonstrate the efficiency of the proposed approach on a range of applications in statistical learning and uncertainty quantification, including parameter estimation for dynamical systems and inverse problems constrained by partial differential equations.

相似文献

11.

Lower bounds for artificial neural network approximations: A proof that shallow neural networks fail to overcome the curse of dimensionality

《Journal of Complexity》2023

相似文献

12.

Polynomial-time universality and limitations of deep learning

Emmanuel Abbe Colin Sandon 《纯数学与应用数学通讯》2023,76(11):3493-3549

The goal of this paper is to characterize function distributions that general neural networks trained by descent algorithms (GD/SGD), can or cannot learn in polytime. The results are: (1) The paradigm of general neural networks trained by SGD is poly-time universal: any function distribution that can be learned from samples in polytime can also be learned by a poly-size neural net trained by SGD with polynomial parameters. In particular, this can be achieved despite polynomial noise on the gradients, implying a separation result between SGD-based deep learning and statistical query algorithms, as the latter are not comparably universal due to cases like parities. This also shows that deep learning does not suffer from the limitations of shallow networks. (2) The paper further gives a lower-bound on the generalization error of descent algorithms, which relies on two quantities: the cross-predictability, an average-case quantity related to the statistical dimension, and the null-flow, a quantity specific to descent algorithms. The lower-bound implies in particular that for functions of low enough cross-predictability, the above robust universality breaks down once the gradients are averaged over too many samples (as in perfect GD) rather than fewer (as in SGD). (3) Finally, it is shown that if larger amounts of noise are added on the initialization and on the gradients, then SGD is no longer comparably universal due again to distributions having low enough cross-predictability. 相似文献

13.

Sparse Deep Neural Network for Nonlinear Partial Differential Equations

Yuesheng Xu & Taishan Zeng 《高等学校计算数学学报(英文版)》2023,16(1):58-78

More competent learning models are demanded for data processing due to increasingly greater amounts of data available in applications. Data that we encounter often have certain embedded sparsity structures. That is, if they are represented in an appropriate basis, their energies can concentrate on a small number of basis functions. This paper is devoted to a numerical study of adaptive approximation of solutions of nonlinear partial differential equations whose solutions may have singularities, by deep neural networks (DNNs) with a sparse regularization with multiple parameters. Noting that DNNs have an intrinsic multi-scale structure which is favorable for adaptive representation of functions, by employing a penalty with multiple parameters, we develop DNNs with a multi-scale sparse regularization (SDNN) for effectively representing functions having certain singularities. We then apply the proposed SDNN to numerical solutions of the Burgers equation and the Schrödinger equation. Numerical examples confirm that solutions generated by the proposed SDNN are sparse and accurate. 相似文献

14.

多层前向正则模糊神经网络的逼近能力

刘普寅《高校应用数学学报(英文版)》2001,16(1):45-57

Abstract. Four-layer feedforward regular fuzzy neural networks are constructed. Universal ap-proximations to some continuous fuzzy functions defined on (R)“ by the four-layer fuzzyneural networks are shown. At first,multivariate Bernstein polynomials associated with fuzzyvalued functions are empolyed to approximate continuous fuzzy valued functions defined on eachcompact set of R“. Secondly,by introducing cut-preserving fuzzy mapping,the equivalent condi-tions for continuous fuzzy functions that can be arbitrarily closely approximated by regular fuzzyneural networks are shown. Finally a few of sufficient and necessary conditions for characteriz-ing approximation capabilities of regular fuzzy neural networks are obtained. And some concretefuzzy functions demonstrate our conclusions. 相似文献

15.

A Comparison Study of Deep Galerkin Method and Deep Ritz Method for Elliptic Problems with Different Boundary Conditions

下载免费PDF全文

Jingrun Chen Rui Du & Keke Wu 《数学研究通讯：英文版》2020,36(3):354-376

Recent years have witnessed growing interests in solving partial differential equations by deep neural networks, especially in the high-dimensional case. Unlike classical numerical methods, such as finite difference method and finite element method, the enforcement of boundary conditions in deep neural networks is highly nontrivial. One general strategy is to use the penalty method. In the work, we conduct a comparison study for elliptic problems with four different boundary conditions, i.e., Dirichlet, Neumann, Robin, and periodic boundary conditions, using two representative methods: deep Galerkin method and deep Ritz method. In the former, the PDE residual is minimized in the least-squares sense while the corresponding variational problem is minimized in the latter. Therefore, it is reasonably expected that deep Galerkin method works better for smooth solutions while deep Ritz method works better for low-regularity solutions. However, by a number of examples, we observe that deep Ritz method can outperform deep Galerkin method with a clear dependence of dimensionality even for smooth solutions and deep Galerkin method can also outperform deep Ritz method for low-regularity solutions.Besides, in some cases, when the boundary condition can be implemented in an exact manner, we find that such a strategy not only provides a better approximate solution but also facilitates the training process. 相似文献

16.

基于小波神经网络的NLAR(p)过程的逼近

朱石焕吴曦《数学季刊》2002,17(4):94-98

小波神经网络是近年来发展起来的一种逼近非线性函数的新型人工神经网络。特别是，正交尺度函数为某函数的小波神经网络更适合于函数逼近。本文在此基础上讨论了小波神经网络对非线性AR(p)过程的逼近。相似文献

17.

Approximation by ridge functions and neural networks with a bounded number of neurons

Vugar E. Ismailov 《Applicable analysis》2013,92(11):2245-2260

In this paper, we consider the problem of approximation of continuous multivariate functions by neural networks with a bounded number of neurons in hidden layers. We prove the existence of single-hidden-layer networks with bounded number of neurons, which have approximation capabilities not worse than those of networks with arbitrarily many neurons. Our analysis is based on the properties of ridge functions. 相似文献

18.

Approximation by neural networks with sigmoidal functions

Dan Sheng Yu 《数学学报(英文版)》2013,29(10):2013-2026

In this paper, we introduce a type of approximation operators of neural networks with sigmodal functions on compact intervals, and obtain the pointwise and uniform estimates of the ap- proximation. To improve the approximation rate, we further introduce a type of combinations of neurM networks. Moreover, we show that the derivatives of functions can also be simultaneously approximated by the derivatives of the combinations. We also apply our method to construct approximation operators of neural networks with sigmodal functions on infinite intervals. 相似文献

19.

Approximating Networks and Extended Ritz Method for the Solution of Functional Optimization Problems

Zoppoli R. Sanguineti M. Parisini T. 《Journal of Optimization Theory and Applications》2002,112(2):403-440

Functional optimization problems can be solved analytically only if special assumptions are verified; otherwise, approximations are needed. The approximate method that we propose is based on two steps. First, the decision functions are constrained to take on the structure of linear combinations of basis functions containing free parameters to be optimized (hence, this step can be considered as an extension to the Ritz method, for which fixed basis functions are used). Then, the functional optimization problem can be approximated by nonlinear programming problems. Linear combinations of basis functions are called approximating networks when they benefit from suitable density properties. We term such networks nonlinear (linear) approximating networks if their basis functions contain (do not contain) free parameters. For certain classes of d-variable functions to be approximated, nonlinear approximating networks may require a number of parameters increasing moderately with d, whereas linear approximating networks may be ruled out by the curse of dimensionality. Since the cost functions of the resulting nonlinear programming problems include complex averaging operations, we minimize such functions by stochastic approximation algorithms. As important special cases, we consider stochastic optimal control and estimation problems. Numerical examples show the effectiveness of the method in solving optimization problems stated in high-dimensional settings, involving for instance several tens of state variables. 相似文献

20.

具有时变时滞惯性四元Hopfield神经网络反周期解的动力学研究

李爱玲吕梦婷《数学研究及应用》2021,41(5):461-472

周期性、反周期性和概周期性是时变神经网络的重要动态行为特性.本文在不将所研究的神经网络分解为实值系统的情况下,根据重合度理论中的延拓定理和不等式技巧,通过构造不同于现有平衡点稳定性研究的李雅普诺夫函数,研究了一类具有变时滞的惯性四元Hopfield神经网络的反周期解的动力学问题,给出了上述神经网络反周期解存在的一个新的判别条件.并通过构造李雅普诺夫函数论证了上述神经网络反周期解的指数稳定性. 相似文献