Pay more attention: Recap of the last week

justynasty@lemmy.kya.moe · edit-2 1 year ago

Pay more attention: Recap of the last week

justynasty@lemmy.kya.moe · 1 year ago

I meant smaller models profit more from the stable perplexity in a long prompt with the recently released code changes. Because the paper(s) mention that some of these changes do not require further fine-tuning, we can use small models in a text that is longer than their context size.

Pay more attention: Recap of the last week

Pay more attention: Recap of the last week

🕳️ Attention Sinks in LLMs for endless fluency