Convergence of Stochastic Gradient Descent in Deep Neural Network Convergence of Stochastic Gradient Descent in Deep Neural Network期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

按检索

Convergence of Stochastic Gradient Descent in Deep Neural Network

作者姓名：	Bai-cun ZHOU Cong-ying HAN Tian-de GUO

作者单位：	School of Mathematical Sciences;Key Laboratory of Big Data Mining and Knowledge Management

基金项目：	supported by the National Natural Science Foundation of China(Nos.11731013,U19B2040,11991022);by the Leading Project of the Chinese Academy of Sciences(Nos.XDA27010102,XDA27010302)。

摘要：	Stochastic gradient descent(SGD) is one of the most common optimization algorithms used in pattern recognition and machine learning.This algorithm and its variants are the preferred algorithm while optimizing parameters of deep neural network for their advantages of low storage space requirement and fast computation speed.Previous studies on convergence of these algorithms were based on some traditional assumptions in optimization problems.However,the deep neural network has its unique properties.Some assumptions are inappropriate in the actual optimization process of this kind of model.In this paper,we modify the assumptions to make them more consistent with the actual optimization process of deep neural network.Based on new assumptions,we studied the convergence and convergence rate of SGD and its two common variant algorithms.In addition,we carried out numerical experiments with LeNet-5,a common network framework,on the data set MNIST to verify the rationality of our assumptions.
关键词：	stochastic gradient descent deep neural network CONVERGENCE
Convergence of Stochastic Gradient Descent in Deep Neural Network

Bai-cun ZHOU,Cong-ying HAN,Tian-de GUO.Convergence of Stochastic Gradient Descent in Deep Neural Network[J].Acta Mathematicae Applicatae Sinica,2021,37(1):126-136.

Authors:	Zhou Bai-cun Han Cong-ying Guo Tian-de

Institution:	1.School of Mathematical Sciences, University of Chinese Academy of Sciences, Beijing, 100049, China ;2.Key Laboratory of Big Data Mining and Knowledge Management, Chinese Academy of Sciences, Beijing, 100190, China ;

Abstract:	Acta Mathematicae Applicatae Sinica, English Series - Stochastic gradient descent (SGD) is one of the most common optimization algorithms used in pattern recognition and machine learning. This...

Keywords:
本文献已被维普 SpringerLink 等数据库收录！

设为首页 | 免责声明 | 关于勤云 | 加入收藏