MSE Graph Language Model — A Deterministic, Explainable Architecture | Clifford Chivhanga
Research — Language Modeling — Graph Architecture
MSE Graph Language Model:<br>Deterministic, Explainable,<br>Zero-Weight Inference
What if a language model had no weights at all — and could still tell you, step by step,<br>exactly why it chose every word it generated?
✎ Clifford Chivhanga<br> June 22, 2026<br>✅ 56 / 56 tests passing<br> SDD v2.1
Contents
1. Introduction
2. Design Philosophy
3. Architecture
4. Tokenizer
5. The Three Matrices
6. Distributional Clustering
7. Inference Pipeline
8. Explainability
9. vs. Transformers
10. Use Cases
11. Track Record
12. Download & Code
Zero learned weights<br>Fully deterministic<br>CPU-only<br>56/56 tests<br>v2.1 — implemented
1. Introduction
Most language models are built around the same idea: train a neural network on enormous amounts<br>of text, let it adjust billions of floating-point weights until it learns to predict the next<br>word reasonably well, and then sample from a probability distribution at inference time.<br>The model is powerful, but it is also a black box — you cannot point to the weight that caused<br>a particular word to be chosen, and two runs with the same input can produce different output.
The MSE Graph Language Model (MSE-GLM) takes a different approach entirely.<br>Language is represented as a directed graph: tokens are nodes, observed transitions are edges,<br>and inference is graph traversal under a small set of explicit, inspectable rules.<br>There are no learned weights, no gradients, no probability sampling — and because of that,<br>every generation decision can be traced back to the exact rule and candidate set that produced it.
What this is not<br>MSE-GLM is not a transformer competitor for open-domain generation or reasoning.<br>It is an architecture for settings where guarantees matter more than fluency —<br>grammar-constrained generation, embedded AI, audit-trail-required tooling, and any pipeline<br>where reproducibility is non-negotiable.
2. Design Philosophy
The core bet is that language, in many constrained domains, does not need to be modeled<br>probabilistically. If the valid output space is a finite set of token transitions<br>— all valid SQL clauses, all valid JSON keys for a schema, all valid assembly mnemonics —<br>then a graph that memorizes exactly those transitions can generate correctly constrained output<br>with zero chance of emitting something it never observed, and zero need for a GPU.
Where the graph is genuinely ambiguous — two equally plausible next tokens given the same<br>context — the architecture resolves that ambiguity using principled, inspectable rules rather<br>than a probability sample. That is the core engineering problem this system solves, and the<br>three-matrix design described below is how it does it.
3. Architecture Overview
Corpus<br>─▶ Tokenizer (BPE)<br>─▶ Sentence Splitting<br>─▶ Sequence Encoding<br>├──────────────────────────────────┬──────────────────────────────────┐<br>▼ ▼ ▼<br>Edge Matrix (E) Bridge Matrix (B) Relationship Matrix (R)<br>deduplicated bigrams deduplicated trigrams (triple_id, relationship_id)<br>CSR-indexed by source + dual-axis cluster_id no triple content duplicated<br>+ T_index<br>Inference Engine<br>4-stage pipeline + lineage tie-break + infer_shared_role()<br>MSEGraphLanguageModel<br>generate · explain_step · infer_shared_role · save/load
Training is a single O(N) pass over the corpus — no backpropagation, no epochs, no GPU.<br>The trained model persists to a self-contained folder of JSON files (vocabulary, edges,<br>bridges, relationships, metadata) that can be loaded and queried on any machine with Python.
4. Tokenizer
The tokenizer is a from-scratch Byte Pair Encoding (BPE) implementation — the same approach<br>used by GPT-2, but written from the ground up with no external dependencies. It converts raw<br>text into integer token IDs through iterative character-pair merging.
Four reserved special tokens anchor the system:
TokenIDRole<br>0Padding placeholder (reserved)<br>1Unknown character fallback<br>2Beginning of sequence — prepended to every prompt<br>3End of sequence — appended during training only
Sentence boundaries (on . ! ? \n) are preserved during training so the graph<br>learns where sequences legally end. Streaming training from a file is supported, so corpus<br>size is not bounded by available RAM.
5. The Three Matrices
The trained model is three compact, array-backed, CSR-indexed structures.<br>All storage uses Python's array.array('i') — 4 bytes per integer,<br>roughly 7× smaller than equivalent Python lists.
Edge Matrix (E)
A deduplicated list of every adjacent token pair observed in the corpus, sorted by source<br>token and indexed for O(1) successor lookup. This is the bigram graph — it answers<br>"what tokens have I seen follow token X?"
Bridge Matrix (B)
Extends E to three-token context: every observed (source, bridge, target)<br>triple is stored once, giving the model trigram-equivalent context with no attention<br>mechanism. The key structural innovation is a fourth column,...