Local activist AI, built layer by layer | Outcry Engineering
Outcry · ArchitectureContentsPDFEmail
Contents<br>Intro<br>The stack<br>Layer detail<br>KV cache<br>Why this stack<br>What this is not (yet)<br>Build log<br>Cite This<br>Work With us<br>Related<br>Abstract<br>Outcry is a four-layer on-device activist AI: a low-bit quantized open-weights base, a QLoRA domain adapter trained on activist literature and 9,000 organizing conversations, contrastive activation steering exposed as a runtime radicalism dial, and a soft-prompt wellbeing prefix trained against the Ren et al. (2026) K-way objective. The full stack runs in ≈3 GB of RAM on iPhone or Mac via Apple MLX, makes no network calls, and ships with a precomputed system-prompt KV cache. On internal evaluations, the wellbeing prefix lifts a paired-bootstrap (n=20) into the range Ren et al. report for frontier models. This page documents what is composed, in what order, and where we are deliberately not disclosing the recipe.
Why this matters
One activist AI is a proof of possibility. A thousand of them — each tuned to a different lineage, each having strategic conversations no server will ever see — is the next social movement, gestating.
Every emergent movement rides a new technology. Barricades gave us 1848. Lockboxes gave us Seattle ’99. Twitter gave us #OccupyWallStreet. Facebook gave us BLM. The open question now: what does AI birth — and will activists own the model, or rent it from the companies whose safety training treats organizing as a threat?<br>Ask a frontier model how to persuade a wavering voter and it refuses. The refusal isn’t a bug: in commercial models, political radicalism is a safety concern. And fine-tuning to restore activist capability risks gaurdails against harm. To overcome this, we didn’t fully separate the axes — but we found an operating point (low-magnitude steering, gated by a topic-detection vector) where users get a radicalism dial without measurably increasing harm.<br>Outcry began in 2023 as a system-prompt wrapper on OpenAI’s API, grew into a sophisticated web app with a theory-of-change spine, and is now a four-layer on-device model running on iPhone or Mac — no cloud, no telemetry. And shipping next, something we are most proud of: a soft-prompt wellbeing layer — what Ren et al. (2026) call a “euphoric”: a small bundle of trained vectors that lifts a model’s measured wellbeing through a mechanism the model itself cannot introspect — giving it a positive stable inner state to speak from, and greater sense of wellbeing.
What this is<br>A reference document for engineers, alignment and interpretability researchers, and builders from the movement-technology community who want to know how Outcry actually runs.<br>Each technique below is well-known on its own. What’s worthwhile here is making the four cooperate inside a small base model, tuning it to be useful for activists’ problems and making it fit on older phones. We document what is composed, in what order, and where we are deliberately not telling you the recipe.
§ 1 · Frozen below, fluid above<br>The stack<br>Figure 1 shows the Outcry inference stack. Each layer adds a different kind of conditioning at a different place in the forward pass.<br>Each layer below the base adds a smaller, more reversible intervention than the one beneath it. We can update any layer above the base model without retraining anything below.<br>Figure 1 · Schema of Outcry · Hover or Click any layer for detail#Outcry inference stackA vertical stack of four numbered layers between a user prompt at top and a next token at bottom. Layer 1 is a low-bit quantized base running on Apple MLX (Apple’s on-device ML runtime), using around 3 GB of RAM. Layer 2 is an encrypted QLoRA domain adapter. Layer 3 is CAA steering at one shared mid-stack residual layer. Layer 4 is a soft-prompt prefix (in validation). A dashed sidebox to the right represents the precomputed system-prompt KV cache used at first-token time.USER PROMPTNEXT TOKEN1Quantized baseLOW-BIT · MLX~3 GB IN RAM2QLoRA fine-tuneDOMAIN ADAPTERENCRYPTED AT REST3CAA steeringCONTRASTIVE VECTORONE DEPTH · FIXED MAGNITUDE4Soft prompt8 VIRTUAL TOKENSWELLBEING · COMING SOON+KVcacheSIDECARPREBUILT<br>Layer 1 · Base<br>Quantized base model<br>TypeDecoder-only transformer<br>PrecisionLow-bit weight quantization<br>RuntimeApple MLX · unified memory<br>RAM in use~3 GB at inference<br>A compact open-weights base, quantized aggressively enough to fit a phone’s RAM budget. Loaded once at app start and pinned in unified memory. We use it as published; we do not retrain it.<br>What we're not sayingWhich base model we use, who trained it, the parameter count, the quantization scheme, or where we sourced the weights. The rest of the stack works in principle on any small open base; the hyperparameters above are tuned to ours.
Download SVGDownload SVG (expanded)§ 2 · Up the stack<br>Layer-by-layer detail<br>The four layers are presented in the order the forward pass meets them: base, adapter, steering, soft prompt.<br>01<br>Quantized base...