Analysis of mean-field models arising from self-attention dynamics in transformer architectures with layer normalization

Martin Burger, Samira Kabri, Yury Korolev, Tim Roith, Lukas Weigand

Research output: Working paper / PreprintPreprint

88 Downloads (Pure)

Fingerprint

Dive into the research topics of 'Analysis of mean-field models arising from self-attention dynamics in transformer architectures with layer normalization'. Together they form a unique fingerprint.
Sort by

Mathematics