On local inference | av.codesBack to blogOn local inference<br>Jun 18, 2026 - llms, local-inference<br>I woke up at 2 a.m. because the fans were screaming. The sound was different from the soft whoosh of a GPU under load. It was the high-pitched panic of a model that had swallowed its own context and was eating the swap file. I sat up for a second, then walked to the other room and killed the process. The house was quiet again. I went back to bed angry but not surprised. That is local inference now. The romance of rebellion against the cloud wore off long ago, now it's mostly quiet labor of checking on a sick animal at night.<br>For a couple of years it was a hobby. I downloaded weights the way some people "download" vinyl. I ran quants I barely understood and felt clever when a reply came back. Then agents arrived, and the model stopped being a toy I visit. It became the room my work lives in. OpenCode drafts here and Hermes listens here. When it breaks, the damage is immediate, my notes stop, my drafts stall, ideas do not validate themselves, and I have to decide whether to fix it now or in the morning.<br>I do not think about the tools when things are good. I think about llama.cpp or a container frontend when I am staring at wrong quant at midnight, or when an update breaks two years of chat history. They are the broom and the bucket I use to clean up the mess.<br>Last week the electricity bill arrived and I winced before opening it. I sometimes only notice a silent break when replies get dumb or when the bill is higher than expected.<br>I keep it alive because the alternative is renting intelligence that forgets where it lives. My rig is slow and loud. The answers are sometimes dumb enough to laugh at (but mostly to be angry). When I type at 3 a.m. and the answer comes back in my own room, no one else is there. That is local inference in 2026, a room that is kept warm with a machine thinking. The work is invisible and the reward is smaller than I admit, but I keep the room warm anyway.