【提问】如何评价Google的LAMB优化器? 问答

另外,如何评价 AdamW优化器?

另外,如何评价gelu激活函数?

如何评价 new gelu激活函数呢?

2020年9月21日 22 次浏览
2个评论
thphd 2047站长

On LAMB

At first, they thought all gradients are equal, so SGD should work for everything.

Then they realize some gradients are different, and there needs to be some sort of adaptation, like rmsprop/adagrad/adam.

After some more time, they realize that the variation in gradients cannot simply be characterized by a scalar/ bunch of scalars. the degree of adaptation needs to catch up with the degrees of variation. more sophisticated adaptation schemes were developed: normalization, feedback control, and so on.

If we go down this path we're likely to end up with network topologies where feedback/normalization mechanisms are distributed among the massive number of weights, each taking care of the few weights around it. much like a mammal brain.

( 由 作者 2020年9月22日 编辑 )
图书管理员
霏艺Faye 图书管理员

@thphd #15838589 我在做SEO

这样,别人搜索关键字的时候,Google就会显示2047了

会搜这些关键字的华人,文化素质会比较高。。。

另外,我需要一个文化水平比较高的地方谈些学术的东西

欲参与讨论,请 登录注册

世界由四物支撑,智者的学问,伟人的公正,正人的祈祷,以及勇者的勇气。但是,如果没有一位懂得统治艺术的统治者,那这一切将毫无用处。 ——《沙丘》