Analysis of mean-field models arising from self-attention dynamics in transformer architectures with layer normalization
Martin Burger, Samira Kabri, Yury Korolev, Tim Roith, Lukas Weigand
Research output: Working paper / Preprint › Preprint
88
Downloads
(Pure)