what's the best model these days I could fit in 128gb ram?

trave@lemmy.sdf.org · 4 days ago

what's the best model these days I could fit in 128gb ram?

trave@lemmy.sdf.org · 4 days ago

some coding yeah but also want one that’s just good ‘general purpose’ chat.

Not sure how much context… from what I’ve heard models kinda break down at super large context anyway? Though I’d love to have as large of a functional context as possible. I guess it’s somewhat a tradeoff in ram usage as the context all gets loaded into memory?

Womble@piefed.world · 4 days ago

If you really dont care about speed (as in ask a question and come back half an hour later dont care) you could try a 3 bit quantization of qwen3 thinking thats at around 100GB so you could fit it in memory and still have enough leftover for the OS. But I’m not kidding about coming back an hour later for your response (or even longer), thats a very big model for a decade old computer.

mierdabird@lemmy.dbzer0.com · 4 days ago

Qwen 3 coder is the current top dog for coding afaik, there’s a 30b size and something bigger but I can’t remember what because I have no hope of running it lol. But I think the larger models have up to a million token context window.