Otari: Own Your AI Stack

Otari: Own Your AI Stack | AI Gateway & Hosted Platform

Closed-source frontier providers offer what looks like a complete stack: tools, MCP server integrations, execution environments, web search, spend controls, etc. Choosing one for your next project feels like a no-brainer. Then you decide to run an open-weights model, for cost, sovereignty, or simply because you can. Most of that stack disappears. You get a chat endpoint. The rest is yours to rebuild. That's the gap Otari closes. Today we're launching Otari , an open-source LLM gateway built on top of any-llm, and Otari.ai , the hosted platform built around it. Together, they let you choose any model, whether it is frontier or open-weights, hosted or self-served, without giving up the developer experience and capabilities you expect and, most importantly, without compromising your privacy. What Otari Is Otari brings the missing pieces to your stack: user management, provider key management, usage and budget tracking, and a set of tools to make open source models more capable. Better cost and privacy without compromising on capabilities. And you are not locked into Python, you can connect via one of our SDKs or by hitting the API directly. Closing the Capability Gap Frontier providers ship more than just weights. They ship code execution, web search, transcription, image generation, and batching. When you switch a workload from Claude or GPT to an open-weights model, those tools do not come with you. The model regresses to a simple chat endpoint, and your application code must grow a layer it did not need before. Otari ships those capabilities as server-side, model-agnostic tools. The gateway dispatches them to any model that supports tool calls: Sandboxed code execution. A Docker-isolated Python REPL, invoked server-side when a model needs to run code. Any tool-using model now has a code interpreter. You don't fine-tune for it; you don't write the sandbox; it's just there. Web search. Current-information retrieval via SearXNG out of the box, with the option to plug in Tavily, Brave, or Exa. Your open-weights model is no longer stuck at its training cutoff. Audio in, images out. OpenAI-compatible transcription and image generation endpoints, so multimodal pipelines keep working when you swap the model behind them. Reranking. LLM-powered document reranking for RAG, independent of your generation model. Batch processing. OpenAI-compatible asynchronous batch API for workloads where latency doesn't matter and cost does. Choosing open-source models shouldn't mean losing capabilities. Otari levels the playing field. The same tools you use with closed-source providers are attached to whatever model you choose. Pair an open-weights chat model with Otari and you get a fully equipped agent runtime, not a stripped-down one. And we're not stopping here. Guardrails powered by llamafile, encoderfile, and any-guardrail are next, so the safety and classification layers around your model run fast and locally, even without a GPU. The Operational Layer The other half of why a gateway exists is the boring, important stuff every team ends up building for itself. Otari ships it: Virtual API keys: Hashed, named, optionally-expiring keys bound to a user, so clients never see your upstream provider credentials. User management and budgets: Per-user spending caps with configurable reset windows. Usage and spend tracking: Real-time cost calculation across providers. Rate limiting: Configurable RPM caps per user, with hits exported as Prometheus metrics. Health and Prometheus metrics. Platform mode: Delegation-based multi-tenant authorization, which is the seam Otari.ai is built on. Otari.ai: The Hosted Platform Otari is the engine. Otari.ai is what you get when you don't want to run it yourself. It is the managed, team-oriented surface built on top of the OSS gateway. Identity and teams. User accounts, organizations with role-based access (owner, admin, member), workspaces scoped to organizations, each with their own keys, members, playground, and spending dashboards. Routing Policies. Define how requests flow across providers and models at the workspace level. We are starting with a simple fallback system and we will be expanding on more elaborate routers in the near future. Secure vault. Provider credentials encrypted at rest. Managed providers. Reach frontier models through Otari.ai without bringing your own API key. Billed against your wallet at transparent per-token pricing. Mozilla.ai provider. A first-party managed provider routes to open-weights models. Auto-provisioned per organization. Same gateway, same budgets, same traces. Open-weights as a first-class citizen. Multi-level budgets and wallets. Spend limits per provider key, plus per-member-per-provider-key caps for fine-grained control, each with their own reset cadence. Declarative configuration. Describe an entire organization — workspaces, provider keys, routing policies, budgets, member...

Otari: Own Your AI Stack

Related Articles

Amazon, Facebook, FBI have access to a private intelligence-sharing network

Show HN: GoPeek – open links in live mini browser windows without new tabs

Agent Memory: An Anatomy

SpaceX not the behemoth everyone thought

Naphtha Shortages Having a Growing Impact in Japan