shivvr — semantic embedding & cognitive agent service
shivvr 🔪 v0.3.0
Ephemeral semantic embedding & cognitive agent service.
Chunk text. Embed with GTR-T5-base. Search via hybrid FST-BM25 vector rank fusion. Stream turn-based cognitive agent reasoning via native Model Context Protocol (MCP).
View API<br>Quick Start<br>GitHub<br>Health
2298s<br>Uptime
CUDA<br>Compute
Sessions
Chunks
Encryption
enabled<br>Inversion
Capabilities
Ingest
Sentence-boundary chunking + GTR-T5-base embeddings (768d). Stores in RwLock — pure ephemeral compute.
FST-BM25 Hybrid Search
RRF blending dense vectors and sparse lexical indices. FST dictionary scanning provides microsecond safe query guardrails and intent-entity score boosting.
Cognitive GhostAgent
Turn-based agent loop with integrated memory search, document ingestion, session indexing, and sandboxed command execution. Powered by OpenAI/Anthropic.
Native MCP Server
Complete Model Context Protocol HTTP/SSE server endpoint (/mcp/sse) allowing immediate, zero-config integration with Claude Code, Antigravity, and Codex.
Crypto
Per-agent orthogonal matrix rotation on embeddings. Cosine similarity preserved under encryption. Keys are in-memory only.
Dual embedding
organize role uses GTR-T5-base (768d). retrieve role uses OpenAI text-embedding-ada-002 (1536d) — pass your own key or set server-side.
API
MethodEndpointDescription<br>GET/healthStatus, model info, live counts<br>GET/mcp/sseMCP Server SSE handshake<br>POST/mcp/messageMCP Server JSON-RPC message router<br>POST/sessions/:id/agent/chatStream non-blocking GhostAgent cognitive turns (SSE)<br>POST/sessions/:id/ingestChunk + embed text into session<br>GET/sessions/:id/search?q=...Semantic search (supports RRF hybrid & lexical_only)<br>GET/sessions/:idSession metadata<br>DELETE/sessions/:idDelete session<br>GET/tempList temp stores with TTL<br>POST/temp/:name/ingestIngest into temp store (2 hr TTL)<br>GET/temp/:name/search?q=...Search temp store<br>DELETE/temp/:nameDelete temp store<br>POST/agent/:id/registerRegister per-agent orthogonal key<br>POST/agent/:id/encryptEncrypt embeddings<br>POST/agent/:id/decryptDecrypt embeddings<br>POST/invertReconstruct text from embedding vector
Quick start
# Ingest into session<br>curl -X POST https://shivvr.nuts.services/sessions/my-session/ingest \<br>-H "Content-Type: application/json" \<br>-d '{"text": "Supreme Raven is protected by Known Opossum.", "source": "vault_specs"}'
# Autonomous Agent Conversational Chat (Streams Thoughts, ToolCalls, & Answer via SSE)<br>curl -i -X POST http://localhost:8085/sessions/my-session/agent/chat \<br>-H "Content-Type: application/json" \<br>-d '{"message": "Who protects the Supreme Raven?"}'
# Vector-Lexical Hybrid RRF Search<br>curl "http://localhost:8085/sessions/my-session/search?q=Known+Opossum&hybrid=true"
# High-speed Lexical-Only BM25 Search (Bypasses ONNX embedder)<br>curl "http://localhost:8085/sessions/my-session/search?q=Opossum&lexical_only=true"
# Synchronize Claude Code or Antigravity with shivvr's Native MCP Server<br>nemesis8 mcp add http://localhost:8085/mcp/sse
Search parameters
ParamDefaultDescription<br>qrequiredQuery text<br>n5Number of results<br>hybridfalseBlend semantic vectors + BM25 scores (Reciprocal Rank Fusion)<br>lexical_onlyfalseBypass vector embedder, execute pure BM25 search<br>guardrailtrueEnable FST toxic term scanning and automatic query blocking<br>roleorganizeorganize (768d local) or retrieve (1536d OpenAI)<br>time_weight0.0Blend semantic + recency score (0–1)<br>decay_halflife_hours168Recency decay half-life in hours<br>include_nearbyfalseReturn temporally adjacent chunks<br>agent_id—Agent ID for encrypted search<br>openai_api_key—Per-request OpenAI key for retrieve role (overrides server key)
Environment
VariableDefaultDescription<br>PORT8080Listen port<br>MODEL_PATHmodels/gtr-t5-base.onnxGTR-T5-base ONNX embedder<br>TOKENIZER_PATHmodels/tokenizer.jsonTokenizer<br>OPENAI_API_KEY—Enables OpenAI completions and retrieve embeddings<br>ANTHROPIC_API_KEY—Enables Anthropic completions and GhostAgent loops<br>NUTS_AUTH_JWKS_URL—Enable auth (open dev mode if unset)<br>NUTS_AUTH_VALIDATE_URLhttps://auth.nuts.services/api/validateAPI token validation endpoint
Stack
LayerChoice<br>RuntimeRust + Tokio + axum<br>CognitionGhostAgent cognitive RAG turn loop (OpenAI / Anthropic compat)<br>MCP ServerHTTP/SSE JSON-RPC 2.0 Model Context Protocol transport layer<br>Hybrid IndexTantivy FST deterministic phrase engine + BM25F field indexer<br>EmbeddingGTR-T5-base (768d) via ONNX Runtime 2.0 — local, required<br>StorageEphemeral RwLock — no disk, no volume mounts<br>GPUCUDA 12.6 via ort EP on Cloud Run L4 — CPU fallback automatic<br>Inversionvec2text gtr-base (projection + T5 enc/dec) — optional
GitHub<br>Health JSON<br>Sessions
shivvr · Rust + ONNX Runtime
DeepBlueDynamics<br>·<br>nuts.services<br>·<br>hyperia<br>·<br>@deepbluedynamic