When I first got into local LLMs nearly 3 years ago, in mid 2023, the frontier closed models were ofcourse impressively capable.
I then tried my hand on running 7b size local models, primarily one called Zephyr-7b (what happened to these models?? Dolphin anyone??), on my gaming PC with 8GB AMD RX580 GPU. Fair to say it was just a curiosity exercise (in terms of model performance).
Fast forward to this month, I revisit local LLM. (Although I no longer have the gaming PC, cost-of-living-crisis anyone 😫 )
And, the 31b size models look very sufficient. #Qwen has taken the helm in this order. Which is still very expensive to setup locally, although within grasp.
I’m rooting for the edge-computing models now - the ~2b size models. Due to their low footprint, they are practical to run in a SBC 24/7 at home for many people.
But these edge models are the ‘curiosity category’ now.


This weekend I had an LLM walk me through setting up some home server stuff and networking. I tried using Proton’s Lumo and Qwen 3.6 locally. I have to say Qwen was the more impressive of the two models. When I first tried running models locally like llama 4, I remember thinking to myself that this was a dead end and big servers would always have the advantage, but it seems like we’re hitting a turning point where many things can be done locally.
cool what was your hardware, and which qwen size you used? thanks
I have a 24GB AMD 7900XTX, and it’s a 35b parameter model.
Ooo… I’m running a 7900 XTX as well. Having 24GB without the Nvidia tax has been super nice for AI stuff. I have a 16GB 6900 XT running in another computer, and a lot of my AI model selection is still sized for it. I may need to stop procrastinating and copy your setup sooner rather than later.
Before I forget, can I ask you what GPU driver version you’re running? I recently encountered some stability issues after a driver update (trying to support gaming and AI stuff at the same time), and the latest version I could find any stability claims for was 24.12.1.