Run your own local LLM with rate limits via API-keys

GitHub - skorotkiewicz/llm-rt: Small Ruby prototype for an OpenAI-compatible LLM proxy with a refillable token bucket · GitHub

/" data-turbo-transient="true" />

Search or jump to...

Search code, repositories, users, issues, pull requests...

-->

Clear

Search syntax tips

Provide feedback

--> We read every piece of feedback, and take your input very seriously.

Include my email address so I can be contacted

Cancel

Submit feedback

Saved searches

Use saved searches to filter your results more quickly

-->

Name

Query

To see all available qualifiers, see our documentation.

Cancel

Create saved search

/;ref_cta:Sign up;ref_loc:header logged out"}" Sign up

Appearance settings

Resetting focus

You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session. You switched accounts on another tab or window. Reload to refresh your session.

Dismiss alert

skorotkiewicz

llm-rt

Public

Notifications You must be signed in to change notification settings

Fork

Star

main

BranchesTags

Go to file

CodeOpen more actions menu

Folders and files NameNameLast commit message Last commit date Latest commit

History 8 Commits 8 Commits

README.md

curl_local_proxy.sh

llm_proxy.rb

run_local_proxy.sh

test_llm_proxy.rb

View all files

Repository files navigation

LLM token bucket proxy

Small Ruby prototype for an OpenAI-compatible LLM proxy with a refillable token bucket.

It uses only Ruby standard libraries: no gems, no Rack, no WEBrick.

Run

BASE_API_URL=http://192.168.0.124:8888/v1 \ BASE_API_KEY=1mmer \ BASE_MODEL=gemma4 \ ruby llm_proxy.rb

The proxy listens on 0.0.0.0:8899 by default.

For your local LLM at 192.168.0.124:8888, run the saved local setup:

./run_local_proxy.sh

That starts the Ruby proxy at http://127.0.0.1:8899/v1 and forwards to http://192.168.0.124:8888/v1.

The saved local curl check is:

./curl_local_proxy.sh

Manual equivalent:

curl -sS -i -m 60 http://127.0.0.1:8899/v1/chat/completions \ -H 'Authorization: Bearer user-a' \ -H 'Content-Type: application/json' \ -d '{ "model": "gemma4", "messages": [{"role": "user", "content": "Reply with exactly: proxy ok"}], "max_tokens": 16 }'

Verified result through the proxy: the upstream replied with proxy ok and the proxy returned X-RateLimit-Remaining: 0 with the local test bucket.

Run the smoke test:

ruby test_llm_proxy.rb

Token bucket settings

MAX_TOKENS=10 # max saved tokens per user REFILL_TOKENS=2 # tokens added each refill REFILL_INTERVAL_SECONDS=300 # 5 minutes REQUEST_TOKEN_COST=1 # cost per accepted completion request

Each bearer token gets its own bucket. Requests without a bearer token are bucketed by remote IP. Set PROXY_API_KEYS=key1,key2 if the proxy should reject unknown client keys.

When the bucket is empty, /v1/chat/completions and /v1/completions return a normal OpenAI-style assistant response:

limit reached, wait 5 min

Test request

curl http://localhost:8888/v1/chat/completions \ -H 'Authorization: Bearer user-a' \ -H 'Content-Type: application/json' \ -d '{ "model": "anything", "messages": [{"role": "user", "content": "hello"}] }'

Optional estimated token mode

By default, one completion request costs REQUEST_TOKEN_COST bucket tokens. To charge roughly by prompt size plus expected output:

TOKEN_COST_MODE=estimate RESPONSE_TOKEN_RESERVE=256 ruby llm_proxy.rb

This is only an approximation for the prototype.

About

Small Ruby prototype for an OpenAI-compatible LLM proxy with a refillable token bucket

Resources

Readme

Uh oh!

There was an error while loading. Please reload this page.

Activity

Stars

stars

Watchers

watching

Forks

forks

Report repository

Contributors

Uh oh!

There was an error while loading. Please reload this page.

Languages

Ruby 97.6%

Shell 2.4%

You can’t perform that action at this time.

Run your own local LLM with rate limits via API-keys

Related Articles

Amazon, Facebook, FBI have access to a private intelligence-sharing network

Show HN: GoPeek – open links in live mini browser windows without new tabs

Agent Memory: An Anatomy

SpaceX not the behemoth everyone thought

The Mirror Is Part of the Machine