GitHub - skorotkiewicz/llm-rt: Small Ruby prototype for an OpenAI-compatible LLM proxy with a refillable token bucket · GitHub
/" data-turbo-transient="true" />
Skip to content
Search or jump to...
Search code, repositories, users, issues, pull requests...
-->
Search
Clear
Search syntax tips
Provide feedback
--><br>We read every piece of feedback, and take your input very seriously.
Include my email address so I can be contacted
Cancel
Submit feedback
Saved searches
Use saved searches to filter your results more quickly
-->
Name
Query
To see all available qualifiers, see our documentation.
Cancel
Create saved search
Sign in
/;ref_cta:Sign up;ref_loc:header logged out"}"<br>Sign up
Appearance settings
Resetting focus
You signed in with another tab or window. Reload to refresh your session.<br>You signed out in another tab or window. Reload to refresh your session.<br>You switched accounts on another tab or window. Reload to refresh your session.
Dismiss alert
{{ message }}
skorotkiewicz
llm-rt
Public
Notifications<br>You must be signed in to change notification settings
Fork
Star
main
BranchesTags
Go to file
CodeOpen more actions menu
Folders and files<br>NameNameLast commit message<br>Last commit date<br>Latest commit
History<br>8 Commits<br>8 Commits
README.md
README.md
curl_local_proxy.sh
curl_local_proxy.sh
llm_proxy.rb
llm_proxy.rb
run_local_proxy.sh
run_local_proxy.sh
test_llm_proxy.rb
test_llm_proxy.rb
View all files
Repository files navigation
LLM token bucket proxy
Small Ruby prototype for an OpenAI-compatible LLM proxy with a refillable token bucket.
It uses only Ruby standard libraries: no gems, no Rack, no WEBrick.
Run
BASE_API_URL=http://192.168.0.124:8888/v1 \<br>BASE_API_KEY=1mmer \<br>BASE_MODEL=gemma4 \<br>ruby llm_proxy.rb
The proxy listens on 0.0.0.0:8899 by default.
For your local LLM at 192.168.0.124:8888, run the saved local setup:
./run_local_proxy.sh
That starts the Ruby proxy at http://127.0.0.1:8899/v1 and forwards to http://192.168.0.124:8888/v1.
The saved local curl check is:
./curl_local_proxy.sh
Manual equivalent:
curl -sS -i -m 60 http://127.0.0.1:8899/v1/chat/completions \<br>-H 'Authorization: Bearer user-a' \<br>-H 'Content-Type: application/json' \<br>-d '{<br>"model": "gemma4",<br>"messages": [{"role": "user", "content": "Reply with exactly: proxy ok"}],<br>"max_tokens": 16<br>}'
Verified result through the proxy: the upstream replied with proxy ok and the proxy returned X-RateLimit-Remaining: 0 with the local test bucket.
Run the smoke test:
ruby test_llm_proxy.rb
Token bucket settings
MAX_TOKENS=10 # max saved tokens per user<br>REFILL_TOKENS=2 # tokens added each refill<br>REFILL_INTERVAL_SECONDS=300 # 5 minutes<br>REQUEST_TOKEN_COST=1 # cost per accepted completion request
Each bearer token gets its own bucket. Requests without a bearer token are bucketed by remote IP. Set PROXY_API_KEYS=key1,key2 if the proxy should reject unknown client keys.
When the bucket is empty, /v1/chat/completions and /v1/completions return a normal OpenAI-style assistant response:
limit reached, wait 5 min
Test request
curl http://localhost:8888/v1/chat/completions \<br>-H 'Authorization: Bearer user-a' \<br>-H 'Content-Type: application/json' \<br>-d '{<br>"model": "anything",<br>"messages": [{"role": "user", "content": "hello"}]<br>}'
Optional estimated token mode
By default, one completion request costs REQUEST_TOKEN_COST bucket tokens. To charge roughly by prompt size plus expected output:
TOKEN_COST_MODE=estimate RESPONSE_TOKEN_RESERVE=256 ruby llm_proxy.rb
This is only an approximation for the prototype.
About
Small Ruby prototype for an OpenAI-compatible LLM proxy with a refillable token bucket
Resources
Readme
Uh oh!
There was an error while loading. Please reload this page.
Activity
Stars
stars
Watchers
watching
Forks
forks
Report repository
Contributors
Uh oh!
There was an error while loading. Please reload this page.
Languages
Ruby<br>97.6%
Shell<br>2.4%
You can’t perform that action at this time.