Learning Regular Languages with the TTT Algorithm
TLDR; This tutorial is a complete implementation of the TTT algorithm<br>for active automata learning in Python. TTT combines the discrimination<br>tree of Kearns and Vazirani with binary search counterexample analysis<br>from Rivest and Schapire, and adds prefix transformation and discriminator<br>finalization to eliminate all redundant membership queries. The Python<br>interpreter is embedded so that you can work through the implementation<br>steps.
In my previous post,<br>I implemented Angluin’s L* algorithm for learning regular languages from<br>a blackbox oracle. L* uses a flat observation table to track state<br>distinctions, which leads to redundant membership queries: when a<br>counterexample arrives, all its suffixes are added as columns even though<br>most distinguish no new states.
The key insight that the discrimination tree is a better data structure<br>for this job was due to Kearns and Vazirani 1, who replaced<br>L*’s observation table with a binary tree of discriminators. Rivest and<br>Schapire 2 independently contributed binary search<br>counterexample analysis, which finds the single relevant suffix in a<br>counterexample in \(O(\log k)\) queries rather than adding all \(k\)<br>suffixes.
TTT by Isberner, Howar and Steffen 3<br>adds two further refinements: prefix transformation,<br>which keeps access sequences minimal, and discriminator finalization,<br>which keeps the discrimination tree shallow. Together these make TTT<br>provably redundancy-free. That is, it never makes a membership query whose<br>answer could have been derived from earlier queries.
TTT is the algorithm of choice in practical automata learning tools such<br>as LearnLib 4. ADT 5 extends TTT with adaptive<br>distinguishing sequences, which can reduce resets in hardware settings,<br>though performance differences in software engineering settings are modest.
Definitions
Alphabet \(A\): the set of input symbols the DFA reads.
Membership query: a string passed to the blackbox oracle. The oracle<br>answers yes (accepted) or no (rejected).
Equivalence query: a hypothesis grammar passed to the teacher. The<br>teacher answers yes, or returns a counterexample string where the<br>hypothesis and the target disagree.
PAC oracle: a probabilistic approximation to the equivalence oracle.<br>After \(N\) random tests without finding a counterexample, we declare<br>the hypothesis probably approximately correct.
Discrimination tree (DT): a binary tree whose inner nodes are<br>discriminator suffixes and whose leaves are states. Sifting a string<br>\(w\) through the tree classifies it to a state using one membership<br>query per level.
Access sequence \(acc(q)\): the shortest known string that reaches<br>state \(q\) in the target.
Spanning tree: a mapping from each known state to its access sequence.<br>A dict from state to the shortest string known to reach it.
Open transition: a transition from state \(q\) on symbol \(a\) whose<br>target state has no access sequence yet, meaning TTT has not yet<br>determined which state it leads to.
Counterexample decomposition: the process of finding the split point<br>in a counterexample, extracting a new discriminator, and splitting a<br>leaf in the DT.
Contents
Definitions
Prerequisites
The Road from L* to TTT
The DFA Representation
The Oracle
The Discrimination Tree<br>Sifting
The Spanning Tree
Hypothesis Construction<br>Incremental Hypothesis Update
Counterexample Decomposition<br>The Split Point
Prefix Transformation
Splitting a Leaf
Discriminator Finalization
Putting Decomposition Together
Non-Redundancy
A Note on the Equivalence Oracle
DT Coherence After Split
The Main Loop
Examples
Evaluating Model Accuracy
Comparison with L*
References
Artifacts
Important: Pyodide takes time to initialize.<br>Initialization completion is indicated by a red border around Run all button.
Run all
Prerequisites
We use the Teacher and Oracle from the L* post unchanged.<br>The PAC equivalence oracle in Teacher is a direct drop-in for TTT.<br>The rest of the algorithm is completely independent of how equivalence<br>queries are answered.
Available Packages
These are packages that refer either to my previous posts or to pure python<br>packages that I have compiled, and is available in the below locations. As<br>before, install them if you need to run the program directly on the machine.<br>To install, simply download the wheel file (`pkg.whl`) and install using<br>`pip install pkg.whl`.
simplefuzzer-0.0.1-py2.py3-none-any.whl from "The simplest grammar fuzzer in the world".
rxfuzzer-0.0.1-py2.py3-none-any.whl from "Fuzzing With Regular Expressions".
earleyparser-0.0.1-py2.py3-none-any.whl from "Earley Parser".
cfgrandomsample-0.0.1-py2.py3-none-any.whl from "Uniform Random Sampling of Strings from Context-Free Grammar".
cfgremoveepsilon-0.0.1-py2.py3-none-any.whl from "Remove Empty (Epsilon) Rules From a Context-Free Grammar.".
gatleastsinglefault-0.0.1-py2.py3-none-any.whl from "Specializing Context-Free Grammars for Inducing Faults".
hdd-0.0.1-py2.py3-none-any.whl...