200x kv Cache Compression

nico1 pts0 comments

Charlie O'Neill on X: "1/ You can shrink a language model's KV cache by 200×, in a single forward pass, and it still answers correctly.

At 256k context that's 36 GiB of cache down to ~360 MiB, with no change to the base model.

Here's how we did it 👇 https://t.co/He1ucvxGyf" / X<br>Post

Log inSign up

Post

Charlie O'Neill

@oneill_c

1/ You can shrink a language model's KV cache by 200×, in a single forward pass, and it still answers correctly.

At 256k context that's 36 GiB of cache down to ~360 MiB, with no change to the base model.

Here's how we did it 👇

span:not(:empty)~span:not(:empty)]:before:content-['·'] [&>span:not(:empty)~span:not(:empty)]:before:px-1 [&>span:not(:empty)~span:not(:empty)]:before:shrink-0">2:59 PM · Jun 10, 202646.5KViews

:host{display:inline-block;direction:ltr;white-space:nowrap;line-height:var(--number-flow-char-height, 1em) !important}span{display:inline-block}:host([data-will-change]) span{will-change:transform}.number,.digit{padding:round(nearest, calc(var(--number-flow-mask-height, 0.25em) / 2), 1px) 0}.symbol{white-space:pre}16number-flow-react > span{font-kerning:none;display:inline-block;line-height:var(--number-flow-char-height, 1em) !important;padding:calc(round(nearest, calc(var(--number-flow-mask-height, 0.25em) / 2), 1px) * 2) 0}16<br>:host{display:inline-block;direction:ltr;white-space:nowrap;line-height:var(--number-flow-char-height, 1em) !important}span{display:inline-block}:host([data-will-change]) span{will-change:transform}.number,.digit{padding:round(nearest, calc(var(--number-flow-mask-height, 0.25em) / 2), 1px) 0}.symbol{white-space:pre}56number-flow-react > span{font-kerning:none;display:inline-block;line-height:var(--number-flow-char-height, 1em) !important;padding:calc(round(nearest, calc(var(--number-flow-mask-height, 0.25em) / 2), 1px) * 2) 0}56<br>:host{display:inline-block;direction:ltr;white-space:nowrap;line-height:var(--number-flow-char-height, 1em) !important}span{display:inline-block}:host([data-will-change]) span{will-change:transform}.number,.digit{padding:round(nearest, calc(var(--number-flow-mask-height, 0.25em) / 2), 1px) 0}.symbol{white-space:pre}589number-flow-react > span{font-kerning:none;display:inline-block;line-height:var(--number-flow-char-height, 1em) !important;padding:calc(round(nearest, calc(var(--number-flow-mask-height, 0.25em) / 2), 1px) * 2) 0}589<br>:host{display:inline-block;direction:ltr;white-space:nowrap;line-height:var(--number-flow-char-height, 1em) !important}span{display:inline-block}:host([data-will-change]) span{will-change:transform}.number,.digit{padding:round(nearest, calc(var(--number-flow-mask-height, 0.25em) / 2), 1px) 0}.symbol{white-space:pre}538number-flow-react > span{font-kerning:none;display:inline-block;line-height:var(--number-flow-char-height, 1em) !important;padding:calc(round(nearest, calc(var(--number-flow-mask-height, 0.25em) / 2), 1px) * 2) 0}538

Read 16 replies

*]:shrink-0">New to X?<br>Sign up now to get your own personalized timeline!<br>Sign up with GoogleSign up with AppleCreate account<br>By signing up, you agree to the Terms of Service and Privacy Policy, including Cookie Use.

Relevant people<br>Charlie O'Neill@oneill_cFollow

Trending now

Don't miss what's happening<br>People on X are the first to know.

Log inSign up

height number flow span display inline

Related Articles