Yes this is a recipe for extremely slow inference: I’m running a 2013 Mac Pro with 128gb of ram. I’m not optimizing for speed, I’m optimizing for aesthetics and intelligence :)
Anyway, what model would you recommend? I’m looking for something general-purpose but with solid programming skills. Ideally obliterated as well, I’m running this locally I might as well have all the freedoms. Thanks for the tips!
Qwen 3 coder is the current top dog for coding afaik, there’s a 30b size and something bigger but I can’t remember what because I have no hope of running it lol. But I think the larger models have up to a million token context window.