Freddie on X: "https://t.co/7i6f2rdlXJ" / X<br>Post
Log inSign up
Post
Freddie
@freddie_spirit
Article
Theoretical Bottlenecks for Scaling LLM Inference to Get Higher Token per Second<br>Let's establish the fundamental accounting identity that governs everything else. For any workload on any accelerator, execution time is bounded by max(compute_time, memory_time, communication_time)....
span:not(:empty)~span:not(:empty)]:before:content-['·'] [&>span:not(:empty)~span:not(:empty)]:before:px-1 [&>span:not(:empty)~span:not(:empty)]:before:shrink-0">9:18 AM · Jul 2, 202611Views
*]:shrink-0">New to X?<br>Sign up now to get your own personalized timeline!<br>Sign up with GoogleSign up with AppleCreate account<br>By signing up, you agree to the Terms of Service and Privacy Policy, including Cookie Use.
Relevant people<br>Freddie@freddie_spiritFollow
Trending now
Don't miss what's happening<br>People on X are the first to know.
Log inSign up