Yes this is a recipe for extremely slow inference: I’m running a 2013 Mac Pro with 128gb of ram. I’m not optimizing for speed, I’m optimizing for aesthetics and intelligence :)

Anyway, what model would you recommend? I’m looking for something general-purpose but with solid programming skills. Ideally obliterated as well, I’m running this locally I might as well have all the freedoms. Thanks for the tips!

  • Womble@piefed.world
    link
    fedilink
    English
    arrow-up
    1
    ·
    5 days ago

    If you really dont care about speed (as in ask a question and come back half an hour later dont care) you could try a 3 bit quantization of qwen3 thinking thats at around 100GB so you could fit it in memory and still have enough leftover for the OS. But I’m not kidding about coming back an hour later for your response (or even longer), thats a very big model for a decade old computer.