Show HN: Shivvr – Ephemeral semantic embedding and cognitive agent service

kordlessagain1 pts0 comments

shivvr — semantic embedding & cognitive agent service

shivvr 🔪 v0.3.0

Ephemeral semantic embedding & cognitive agent service.

Chunk text. Embed with GTR-T5-base. Search via hybrid FST-BM25 vector rank fusion. Stream turn-based cognitive agent reasoning via native Model Context Protocol (MCP).

View API<br>Quick Start<br>GitHub<br>Health

2298s<br>Uptime

CUDA<br>Compute

Sessions

Chunks

Encryption

enabled<br>Inversion

Capabilities

Ingest

Sentence-boundary chunking + GTR-T5-base embeddings (768d). Stores in RwLock — pure ephemeral compute.

FST-BM25 Hybrid Search

RRF blending dense vectors and sparse lexical indices. FST dictionary scanning provides microsecond safe query guardrails and intent-entity score boosting.

Cognitive GhostAgent

Turn-based agent loop with integrated memory search, document ingestion, session indexing, and sandboxed command execution. Powered by OpenAI/Anthropic.

Native MCP Server

Complete Model Context Protocol HTTP/SSE server endpoint (/mcp/sse) allowing immediate, zero-config integration with Claude Code, Antigravity, and Codex.

Crypto

Per-agent orthogonal matrix rotation on embeddings. Cosine similarity preserved under encryption. Keys are in-memory only.

Dual embedding

organize role uses GTR-T5-base (768d). retrieve role uses OpenAI text-embedding-ada-002 (1536d) — pass your own key or set server-side.

API

MethodEndpointDescription<br>GET/healthStatus, model info, live counts<br>GET/mcp/sseMCP Server SSE handshake<br>POST/mcp/messageMCP Server JSON-RPC message router<br>POST/sessions/:id/agent/chatStream non-blocking GhostAgent cognitive turns (SSE)<br>POST/sessions/:id/ingestChunk + embed text into session<br>GET/sessions/:id/search?q=...Semantic search (supports RRF hybrid & lexical_only)<br>GET/sessions/:idSession metadata<br>DELETE/sessions/:idDelete session<br>GET/tempList temp stores with TTL<br>POST/temp/:name/ingestIngest into temp store (2 hr TTL)<br>GET/temp/:name/search?q=...Search temp store<br>DELETE/temp/:nameDelete temp store<br>POST/agent/:id/registerRegister per-agent orthogonal key<br>POST/agent/:id/encryptEncrypt embeddings<br>POST/agent/:id/decryptDecrypt embeddings<br>POST/invertReconstruct text from embedding vector

Quick start

# Ingest into session<br>curl -X POST https://shivvr.nuts.services/sessions/my-session/ingest \<br>-H "Content-Type: application/json" \<br>-d '{"text": "Supreme Raven is protected by Known Opossum.", "source": "vault_specs"}'

# Autonomous Agent Conversational Chat (Streams Thoughts, ToolCalls, & Answer via SSE)<br>curl -i -X POST http://localhost:8085/sessions/my-session/agent/chat \<br>-H "Content-Type: application/json" \<br>-d '{"message": "Who protects the Supreme Raven?"}'

# Vector-Lexical Hybrid RRF Search<br>curl "http://localhost:8085/sessions/my-session/search?q=Known+Opossum&hybrid=true"

# High-speed Lexical-Only BM25 Search (Bypasses ONNX embedder)<br>curl "http://localhost:8085/sessions/my-session/search?q=Opossum&lexical_only=true"

# Synchronize Claude Code or Antigravity with shivvr's Native MCP Server<br>nemesis8 mcp add http://localhost:8085/mcp/sse

Search parameters

ParamDefaultDescription<br>qrequiredQuery text<br>n5Number of results<br>hybridfalseBlend semantic vectors + BM25 scores (Reciprocal Rank Fusion)<br>lexical_onlyfalseBypass vector embedder, execute pure BM25 search<br>guardrailtrueEnable FST toxic term scanning and automatic query blocking<br>roleorganizeorganize (768d local) or retrieve (1536d OpenAI)<br>time_weight0.0Blend semantic + recency score (0–1)<br>decay_halflife_hours168Recency decay half-life in hours<br>include_nearbyfalseReturn temporally adjacent chunks<br>agent_id—Agent ID for encrypted search<br>openai_api_key—Per-request OpenAI key for retrieve role (overrides server key)

Environment

VariableDefaultDescription<br>PORT8080Listen port<br>MODEL_PATHmodels/gtr-t5-base.onnxGTR-T5-base ONNX embedder<br>TOKENIZER_PATHmodels/tokenizer.jsonTokenizer<br>OPENAI_API_KEY—Enables OpenAI completions and retrieve embeddings<br>ANTHROPIC_API_KEY—Enables Anthropic completions and GhostAgent loops<br>NUTS_AUTH_JWKS_URL—Enable auth (open dev mode if unset)<br>NUTS_AUTH_VALIDATE_URLhttps://auth.nuts.services/api/validateAPI token validation endpoint

Stack

LayerChoice<br>RuntimeRust + Tokio + axum<br>CognitionGhostAgent cognitive RAG turn loop (OpenAI / Anthropic compat)<br>MCP ServerHTTP/SSE JSON-RPC 2.0 Model Context Protocol transport layer<br>Hybrid IndexTantivy FST deterministic phrase engine + BM25F field indexer<br>EmbeddingGTR-T5-base (768d) via ONNX Runtime 2.0 — local, required<br>StorageEphemeral RwLock — no disk, no volume mounts<br>GPUCUDA 12.6 via ort EP on Cloud Run L4 — CPU fallback automatic<br>Inversionvec2text gtr-base (projection + T5 enc/dec) — optional

GitHub<br>Health JSON<br>Sessions

shivvr &middot; Rust + ONNX Runtime

DeepBlueDynamics<br>&middot;<br>nuts.services<br>&middot;<br>hyperia<br>&middot;<br>@deepbluedynamic

agent search sessions post session cognitive

Related Articles