Run your own local LLM with rate limits via API-keys

modinfo1 pts0 comments

GitHub - skorotkiewicz/llm-rt: Small Ruby prototype for an OpenAI-compatible LLM proxy with a refillable token bucket · GitHub

/" data-turbo-transient="true" />

Skip to content

Search or jump to...

Search code, repositories, users, issues, pull requests...

-->

Search

Clear

Search syntax tips

Provide feedback

--><br>We read every piece of feedback, and take your input very seriously.

Include my email address so I can be contacted

Cancel

Submit feedback

Saved searches

Use saved searches to filter your results more quickly

-->

Name

Query

To see all available qualifiers, see our documentation.

Cancel

Create saved search

Sign in

/;ref_cta:Sign up;ref_loc:header logged out"}"<br>Sign up

Appearance settings

Resetting focus

You signed in with another tab or window. Reload to refresh your session.<br>You signed out in another tab or window. Reload to refresh your session.<br>You switched accounts on another tab or window. Reload to refresh your session.

Dismiss alert

{{ message }}

skorotkiewicz

llm-rt

Public

Notifications<br>You must be signed in to change notification settings

Fork

Star

main

BranchesTags

Go to file

CodeOpen more actions menu

Folders and files<br>NameNameLast commit message<br>Last commit date<br>Latest commit

History<br>8 Commits<br>8 Commits

README.md

README.md

curl_local_proxy.sh

curl_local_proxy.sh

llm_proxy.rb

llm_proxy.rb

run_local_proxy.sh

run_local_proxy.sh

test_llm_proxy.rb

test_llm_proxy.rb

View all files

Repository files navigation

LLM token bucket proxy

Small Ruby prototype for an OpenAI-compatible LLM proxy with a refillable token bucket.

It uses only Ruby standard libraries: no gems, no Rack, no WEBrick.

Run

BASE_API_URL=http://192.168.0.124:8888/v1 \<br>BASE_API_KEY=1mmer \<br>BASE_MODEL=gemma4 \<br>ruby llm_proxy.rb

The proxy listens on 0.0.0.0:8899 by default.

For your local LLM at 192.168.0.124:8888, run the saved local setup:

./run_local_proxy.sh

That starts the Ruby proxy at http://127.0.0.1:8899/v1 and forwards to http://192.168.0.124:8888/v1.

The saved local curl check is:

./curl_local_proxy.sh

Manual equivalent:

curl -sS -i -m 60 http://127.0.0.1:8899/v1/chat/completions \<br>-H 'Authorization: Bearer user-a' \<br>-H 'Content-Type: application/json' \<br>-d '{<br>"model": "gemma4",<br>"messages": [{"role": "user", "content": "Reply with exactly: proxy ok"}],<br>"max_tokens": 16<br>}'

Verified result through the proxy: the upstream replied with proxy ok and the proxy returned X-RateLimit-Remaining: 0 with the local test bucket.

Run the smoke test:

ruby test_llm_proxy.rb

Token bucket settings

MAX_TOKENS=10 # max saved tokens per user<br>REFILL_TOKENS=2 # tokens added each refill<br>REFILL_INTERVAL_SECONDS=300 # 5 minutes<br>REQUEST_TOKEN_COST=1 # cost per accepted completion request

Each bearer token gets its own bucket. Requests without a bearer token are bucketed by remote IP. Set PROXY_API_KEYS=key1,key2 if the proxy should reject unknown client keys.

When the bucket is empty, /v1/chat/completions and /v1/completions return a normal OpenAI-style assistant response:

limit reached, wait 5 min

Test request

curl http://localhost:8888/v1/chat/completions \<br>-H 'Authorization: Bearer user-a' \<br>-H 'Content-Type: application/json' \<br>-d '{<br>"model": "anything",<br>"messages": [{"role": "user", "content": "hello"}]<br>}'

Optional estimated token mode

By default, one completion request costs REQUEST_TOKEN_COST bucket tokens. To charge roughly by prompt size plus expected output:

TOKEN_COST_MODE=estimate RESPONSE_TOKEN_RESERVE=256 ruby llm_proxy.rb

This is only an approximation for the prototype.

About

Small Ruby prototype for an OpenAI-compatible LLM proxy with a refillable token bucket

Resources

Readme

Uh oh!

There was an error while loading. Please reload this page.

Activity

Stars

stars

Watchers

watching

Forks

forks

Report repository

Contributors

Uh oh!

There was an error while loading. Please reload this page.

Languages

Ruby<br>97.6%

Shell<br>2.4%

You can’t perform that action at this time.

proxy ruby bucket token saved local

Related Articles