

If you really dont care about speed (as in ask a question and come back half an hour later dont care) you could try a 3 bit quantization of qwen3 thinking thats at around 100GB so you could fit it in memory and still have enough leftover for the OS. But I’m not kidding about coming back an hour later for your response (or even longer), thats a very big model for a decade old computer.
You can tell when posts are from mastodon as they are full of twitterisms like hashtags and @User to reply to comments