Otari: Own Your AI Stack | AI Gateway & Hosted Platform
Sign in<br>Subscribe
Closed-source frontier providers offer what looks like a complete stack: tools, MCP server integrations, execution environments, web search, spend controls, etc. Choosing one for your next project feels like a no-brainer.<br>Then you decide to run an open-weights model, for cost, sovereignty, or simply because you can. Most of that stack disappears. You get a chat endpoint. The rest is yours to rebuild.<br>That's the gap Otari closes.<br>Today we're launching Otari , an open-source LLM gateway built on top of any-llm, and Otari.ai , the hosted platform built around it. Together, they let you choose any model, whether it is frontier or open-weights, hosted or self-served, without giving up the developer experience and capabilities you expect and, most importantly, without compromising your privacy.<br>What Otari Is<br>Otari brings the missing pieces to your stack: user management, provider key management, usage and budget tracking, and a set of tools to make open source models more capable.<br>Better cost and privacy without compromising on capabilities. And you are not locked into Python, you can connect via one of our SDKs or by hitting the API directly.<br>Closing the Capability Gap<br>Frontier providers ship more than just weights. They ship code execution, web search, transcription, image generation, and batching. When you switch a workload from Claude or GPT to an open-weights model, those tools do not come with you. The model regresses to a simple chat endpoint, and your application code must grow a layer it did not need before.<br>Otari ships those capabilities as server-side, model-agnostic tools. The gateway dispatches them to any model that supports tool calls:<br>Sandboxed code execution. A Docker-isolated Python REPL, invoked server-side when a model needs to run code. Any tool-using model now has a code interpreter. You don't fine-tune for it; you don't write the sandbox; it's just there.<br>Web search. Current-information retrieval via SearXNG out of the box, with the option to plug in Tavily, Brave, or Exa. Your open-weights model is no longer stuck at its training cutoff.<br>Audio in, images out. OpenAI-compatible transcription and image generation endpoints, so multimodal pipelines keep working when you swap the model behind them.<br>Reranking. LLM-powered document reranking for RAG, independent of your generation model.<br>Batch processing. OpenAI-compatible asynchronous batch API for workloads where latency doesn't matter and cost does.<br>Choosing open-source models shouldn't mean losing capabilities. Otari levels the playing field. The same tools you use with closed-source providers are attached to whatever model you choose. Pair an open-weights chat model with Otari and you get a fully equipped agent runtime, not a stripped-down one.<br>And we're not stopping here. Guardrails powered by llamafile, encoderfile, and any-guardrail are next, so the safety and classification layers around your model run fast and locally, even without a GPU.<br>The Operational Layer<br>The other half of why a gateway exists is the boring, important stuff every team ends up building for itself. Otari ships it:<br>Virtual API keys: Hashed, named, optionally-expiring keys bound to a user, so clients never see your upstream provider credentials.<br>User management and budgets: Per-user spending caps with configurable reset windows.<br>Usage and spend tracking: Real-time cost calculation across providers.<br>Rate limiting: Configurable RPM caps per user, with hits exported as Prometheus metrics.<br>Health and Prometheus metrics.<br>Platform mode: Delegation-based multi-tenant authorization, which is the seam Otari.ai is built on.<br>Otari.ai: The Hosted Platform<br>Otari is the engine. Otari.ai is what you get when you don't want to run it yourself. It is the managed, team-oriented surface built on top of the OSS gateway.<br>Identity and teams. User accounts, organizations with role-based access (owner, admin, member), workspaces scoped to organizations, each with their own keys, members, playground, and spending dashboards.<br>Routing Policies. Define how requests flow across providers and models at the workspace level. We are starting with a simple fallback system and we will be expanding on more elaborate routers in the near future.<br>Secure vault. Provider credentials encrypted at rest.<br>Managed providers. Reach frontier models through Otari.ai without bringing your own API key. Billed against your wallet at transparent per-token pricing.<br>Mozilla.ai provider. A first-party managed provider routes to open-weights models. Auto-provisioned per organization. Same gateway, same budgets, same traces. Open-weights as a first-class citizen.<br>Multi-level budgets and wallets. Spend limits per provider key, plus per-member-per-provider-key caps for fine-grained control, each with their own reset cadence.<br>Declarative configuration. Describe an entire organization — workspaces, provider keys, routing policies, budgets, member...