Time to first token is 39% faster Agent wall times decrease by 46% No swapsTracks your resource usage in real-time and adjusts how the model runs so that it works perfectly on your device.Implements KV cache sizing, prefix caching, live RAM pressure management, context trimming, KV quantization, and more.Built a ton of features