Phlox: A full-featured AI platform you own

mcdermott1 pts0 comments

Phlox — A full-featured, self-hostable AI platform

GitHub

Self-hostable · Open source · Apache 2.0

A full-featured AI platform<br>you actually own.

Phlox is a self-hostable AI platform — an agentic tool-using harness, document RAG,<br>code execution, an OpenAI-compatible gateway, and per-user cost accounting — running<br>over any model provider: AWS Bedrock or any OpenAI-compatible<br>endpoint, including fully local models.

View on GitHub

Quick start →

Runs over

AWS Bedrock

OpenAI

Ollama

OpenRouter

vLLM

LiteLLM

LM Studio

localhost:5173

~40built-in agent tools

100%self-hostable & offline-capable

8built-in themes

6+model providers, one config

Bring your own model. Or run it all locally.

Named provider profiles cover AWS Bedrock and any<br>OpenAI-compatible endpoint — OpenAI, LiteLLM, or a local runtime. Point Phlox at<br>Ollama, LM Studio, or vLLM and the whole stack — chat and RAG embeddings —<br>runs offline with no cloud API key. Switch profiles live, with a built-in connection<br>tester.

Define as many provider profiles as you like, switch between them instantly

Local embeddings (e.g. nomic-embed-text) keep RAG fully offline

Edit profiles, pricing, and limits live — no server restart required

config.yml<br>Copy

default_profile: local-ollama<br>profiles:<br>local-ollama:<br>type: openai<br>label: "Ollama (local)"<br>endpoint: http://localhost:11434/v1<br>api_key: ollama # ignored by Ollama<br>model: qwen3.6:35b<br>supports_tools: true

Everything in one app<br>A complete assistant, not just a chat box

Phlox bundles the pieces you'd otherwise stitch together yourself — each one self-hosted and under your control.

💬

Streaming chat

Conversation history with rename, delete, search & export. Edit/regenerate messages, markdown with highlighted, copyable code, plus Mermaid diagrams and LaTeX math.

🤖

Agentic harness

The model uses tools in a loop — filesystem, shell, Python/Node execution, document search, plus planning, sub-agents, memory, and checkpoints — all in a sandboxed workspace.

🤝

Human-in-the-loop

Pause on sensitive tools, approve or deny, then resume. The run state is persisted, so approvals survive disconnects.

🧰

Code execution & artifacts

Run code with captured output and inline artifacts. A Workspace Files panel lets you browse and download everything the agent created.

📚

Documents & RAG

Upload PDF, DOCX, TXT, MD, or code. Hybrid dense + sparse search over Qdrant with reranking and citations, scoped globally or per conversation. Works offline.

🌐

Opt-in web search

A per-prompt composer toggle exposes web_search (zero-config ddgs or SearXNG) so the agent can discover current sources before fetching pages.

🧠

Cross-conversation memory

Durable facts are saved and semantically recalled across chats, so the assistant remembers you from one conversation to the next.

🖼️

Multimodal

Attach images to messages for vision models, persisted and replayed into the provider as image content parts.

🔌

MCP integration

Connect Model Context Protocol servers from the UI; their tools join the model's toolset automatically, no code required.

🚪

OpenAI-compatible gateway

Mint per-user API keys and call Phlox from any OpenAI SDK via /v1/chat/completions — with the same per-user cost accounting.

💵

Usage & cost accounting

Per-message token and cost in the UI, plus an admin chargeback view by month × user × department × model, with CSV export for finance.

🎨

Theming

Phlox Dark by default, with Light, Fred Hutch, Hutch Night, Sandstone and more — instant switching via a CSS-variable token system.

The agentic core<br>A real agent, not "chat that calls tools"

Each turn, the model works in a loop — calling tools, planning, and recovering — inside a per-conversation sandboxed workspace you can inspect, snapshot, and roll back.

01 Tool loop

Filesystem (read_file, write_file, edit_file, glob, grep), run_shell, execute_python / execute_node, and search_documents — one unified tool surface the model drives until the task is done.

02 Planning & sub-agents

update_todos keeps a visible plan; spawn_subagent runs a nested, ephemeral agent with a scoped toolset in the same workspace and returns a report.

03 Memory & checkpoints

save_memory persists durable facts across chats. Every workspace is a git repo that auto-snapshots after mutating tools, with one-click restore.

04 Approvals & permissions

Every tool has an auto / ask / deny policy. The loop pauses on ask, persists its state, and resumes statelessly after you decide.

User message

Agent loopprovider.stream → tool_calls

read / write / edit

run_shell

execute_python

search_documents

web_search

spawn_subagent

Permission gate auto · ask · deny

Final answer + artifacts

Knowledge & memory<br>Your documents, searched the right way

Upload PDFs, Office docs, markdown, or source code. Phlox parses, chunks, and<br>embeds them into Qdrant , then retrieves with true hybrid search —<br>a dense semantic vector and a sparse lexical vector per chunk, fused with RRF...

model phlox openai code ollama tools

Related Articles