PhD student at the Chinese University of Hong Kong, Shenzhen, China
https://zyushun.github.io/
Highlights
- Pro
-
Adam-mini Public
Code for Adam-mini: Use Fewer Learning Rates To Gain More https://arxiv.org/abs/2406.16793
-
-
-
hessian-spectrum Public
Code for the paper: Why Transformers Need Adam: A Hessian Perspective
-
iclr-blog-track.github.io Public
Forked from iclr-blog-track/iclr-blog-track.github.ioICLR 2022 Blog-Track: Does Adam Converge and When?
HTML Other UpdatedApr 15, 2022