Yes this is a recipe for extremely slow inference: I’m running a 2013 Mac Pro with 128gb of ram. I’m not optimizing for speed, I’m optimizing for aesthetics and intelligence :)

Anyway, what model would you recommend? I’m looking for something general-purpose but with solid programming skills. Ideally obliterated as well, I’m running this locally I might as well have all the freedoms. Thanks for the tips!

  • trave@lemmy.sdf.orgOP
    link
    fedilink
    English
    arrow-up
    1
    ·
    4 days ago

    some coding yeah but also want one that’s just good ‘general purpose’ chat.

    Not sure how much context… from what I’ve heard models kinda break down at super large context anyway? Though I’d love to have as large of a functional context as possible. I guess it’s somewhat a tradeoff in ram usage as the context all gets loaded into memory?

    • Womble@piefed.world
      link
      fedilink
      English
      arrow-up
      1
      ·
      4 days ago

      If you really dont care about speed (as in ask a question and come back half an hour later dont care) you could try a 3 bit quantization of qwen3 thinking thats at around 100GB so you could fit it in memory and still have enough leftover for the OS. But I’m not kidding about coming back an hour later for your response (or even longer), thats a very big model for a decade old computer.

    • mierdabird@lemmy.dbzer0.com
      link
      fedilink
      English
      arrow-up
      1
      ·
      4 days ago

      Qwen 3 coder is the current top dog for coding afaik, there’s a 30b size and something bigger but I can’t remember what because I have no hope of running it lol. But I think the larger models have up to a million token context window.