Stochastic Gradient Descent and Anomaly of Variance-Flatness Relation in Artificial Neural Networks |
| |
作者姓名: | 熊霞 陈永聪 石春晓 敖平 |
| |
作者单位: | 1. Shanghai Center for Quantitative Life Sciences and Physics Department,Shanghai University;2. Colloge of Biomedical Engineering,Sichuan University |
| |
基金项目: | supported in part by the National Natural Science Foundation of China (Grant No. 16Z103060007(PA)); |
| |
摘 要: | Stochastic gradient descent(SGD), a widely used algorithm in deep-learning neural networks, has attracted continuing research interests for the theoretical principles behind its success. A recent work reported an anomaly(inverse) relation between the variance of neural weights and the landscape flatness of the loss function driven under SGD Feng Y and Tu Y Proc. Natl. Acad. Sci. USA 118 e2015617118(2021)]. To investigate this seeming violation of statistical physics principle, the properties of...
|
|
|