Pay more attention: Recap of the last week

justynasty@lemmy.kya.moe · edit-2 1 year ago

Pay more attention: Recap of the last week

rufus@discuss.tchncs.de · edit-2 1 year ago

It’s practically the same. It’s just faster. It rolls the window further along without needing to recompute the whole context again. It just needs to look at the new tokens, as far as I understand. If you truncate it like we used to do, you have to re-calculate the whole context once you change the first sentence.

The end result is the same.

Pay more attention: Recap of the last week

Pay more attention: Recap of the last week

🕳️ Attention Sinks in LLMs for endless fluency