排序方式: 共有3条查询结果,搜索用时 15 毫秒
1
1.
2.
3.
Stochastic gradient descent(SGD), a widely used algorithm in deep-learning neural networks, has attracted continuing research interests for the theoretical principles behind its success. A recent work reported an anomaly(inverse) relation between the variance of neural weights and the landscape flatness of the loss function driven under SGD [Feng Y and Tu Y Proc. Natl. Acad. Sci. USA 118 e2015617118(2021)]. To investigate this seeming violation of statistical physics principle, the properties of... 相似文献
1