Zero Weights Graph Language Engine (MSE-GLM)

MSE Graph Language Model — A Deterministic, Explainable Architecture | Clifford Chivhanga

Research — Language Modeling — Graph Architecture

MSE Graph Language Model: Deterministic, Explainable, Zero-Weight Inference

What if a language model had no weights at all — and could still tell you, step by step, exactly why it chose every word it generated?

✎ Clifford Chivhanga  June 22, 2026 ✅ 56 / 56 tests passing  SDD v2.1

Contents

1. Introduction

2. Design Philosophy

3. Architecture

4. Tokenizer

5. The Three Matrices

6. Distributional Clustering

7. Inference Pipeline

8. Explainability

9. vs. Transformers

10. Use Cases

11. Track Record

12. Download & Code

Zero learned weights Fully deterministic CPU-only 56/56 tests v2.1 — implemented

1. Introduction

Most language models are built around the same idea: train a neural network on enormous amounts of text, let it adjust billions of floating-point weights until it learns to predict the next word reasonably well, and then sample from a probability distribution at inference time. The model is powerful, but it is also a black box — you cannot point to the weight that caused a particular word to be chosen, and two runs with the same input can produce different output.

The MSE Graph Language Model (MSE-GLM) takes a different approach entirely. Language is represented as a directed graph: tokens are nodes, observed transitions are edges, and inference is graph traversal under a small set of explicit, inspectable rules. There are no learned weights, no gradients, no probability sampling — and because of that, every generation decision can be traced back to the exact rule and candidate set that produced it.

What this is not MSE-GLM is not a transformer competitor for open-domain generation or reasoning. It is an architecture for settings where guarantees matter more than fluency — grammar-constrained generation, embedded AI, audit-trail-required tooling, and any pipeline where reproducibility is non-negotiable.

2. Design Philosophy

The core bet is that language, in many constrained domains, does not need to be modeled probabilistically. If the valid output space is a finite set of token transitions — all valid SQL clauses, all valid JSON keys for a schema, all valid assembly mnemonics — then a graph that memorizes exactly those transitions can generate correctly constrained output with zero chance of emitting something it never observed, and zero need for a GPU.

Where the graph is genuinely ambiguous — two equally plausible next tokens given the same context — the architecture resolves that ambiguity using principled, inspectable rules rather than a probability sample. That is the core engineering problem this system solves, and the three-matrix design described below is how it does it.

3. Architecture Overview

Corpus ─▶ Tokenizer (BPE) ─▶ Sentence Splitting ─▶ Sequence Encoding ├──────────────────────────────────┬──────────────────────────────────┐ ▼ ▼ ▼ Edge Matrix (E) Bridge Matrix (B) Relationship Matrix (R) deduplicated bigrams deduplicated trigrams (triple_id, relationship_id) CSR-indexed by source + dual-axis cluster_id no triple content duplicated + T_index Inference Engine 4-stage pipeline + lineage tie-break + infer_shared_role() MSEGraphLanguageModel generate · explain_step · infer_shared_role · save/load

Training is a single O(N) pass over the corpus — no backpropagation, no epochs, no GPU. The trained model persists to a self-contained folder of JSON files (vocabulary, edges, bridges, relationships, metadata) that can be loaded and queried on any machine with Python.

4. Tokenizer

The tokenizer is a from-scratch Byte Pair Encoding (BPE) implementation — the same approach used by GPT-2, but written from the ground up with no external dependencies. It converts raw text into integer token IDs through iterative character-pair merging.

Four reserved special tokens anchor the system:

TokenIDRole 0Padding placeholder (reserved) 1Unknown character fallback 2Beginning of sequence — prepended to every prompt 3End of sequence — appended during training only

Sentence boundaries (on . ! ? \n) are preserved during training so the graph learns where sequences legally end. Streaming training from a file is supported, so corpus size is not bounded by available RAM.

5. The Three Matrices

The trained model is three compact, array-backed, CSR-indexed structures. All storage uses Python's array.array('i') — 4 bytes per integer, roughly 7× smaller than equivalent Python lists.

Edge Matrix (E)

A deduplicated list of every adjacent token pair observed in the corpus, sorted by source token and indexed for O(1) successor lookup. This is the bigram graph — it answers "what tokens have I seen follow token X?"

Bridge Matrix (B)

Extends E to three-token context: every observed (source, bridge, target) triple is stored once, giving the model trigram-equivalent context with no attention mechanism. The key structural innovation is a fourth column,...

Zero Weights Graph Language Engine (MSE-GLM)

Related Articles

Apple WWDC 2026 Livestream

Claude Fable 5

US Government directive to suspend access to Fable 5 and Mythos 5

Is AI ruining our skills? Early results are in – and they're not good

The Anatomy of an AI-Native Org