Learning Regular Languages with the TTT Algorithm

ibobev1 pts0 comments

Learning Regular Languages with the TTT Algorithm

TLDR; This tutorial is a complete implementation of the TTT algorithm<br>for active automata learning in Python. TTT combines the discrimination<br>tree of Kearns and Vazirani with binary search counterexample analysis<br>from Rivest and Schapire, and adds prefix transformation and discriminator<br>finalization to eliminate all redundant membership queries. The Python<br>interpreter is embedded so that you can work through the implementation<br>steps.

In my previous post,<br>I implemented Angluin’s L* algorithm for learning regular languages from<br>a blackbox oracle. L* uses a flat observation table to track state<br>distinctions, which leads to redundant membership queries: when a<br>counterexample arrives, all its suffixes are added as columns even though<br>most distinguish no new states.

The key insight that the discrimination tree is a better data structure<br>for this job was due to Kearns and Vazirani 1, who replaced<br>L*’s observation table with a binary tree of discriminators. Rivest and<br>Schapire 2 independently contributed binary search<br>counterexample analysis, which finds the single relevant suffix in a<br>counterexample in \(O(\log k)\) queries rather than adding all \(k\)<br>suffixes.

TTT by Isberner, Howar and Steffen 3<br>adds two further refinements: prefix transformation,<br>which keeps access sequences minimal, and discriminator finalization,<br>which keeps the discrimination tree shallow. Together these make TTT<br>provably redundancy-free. That is, it never makes a membership query whose<br>answer could have been derived from earlier queries.

TTT is the algorithm of choice in practical automata learning tools such<br>as LearnLib 4. ADT 5 extends TTT with adaptive<br>distinguishing sequences, which can reduce resets in hardware settings,<br>though performance differences in software engineering settings are modest.

Definitions

Alphabet \(A\): the set of input symbols the DFA reads.

Membership query: a string passed to the blackbox oracle. The oracle<br>answers yes (accepted) or no (rejected).

Equivalence query: a hypothesis grammar passed to the teacher. The<br>teacher answers yes, or returns a counterexample string where the<br>hypothesis and the target disagree.

PAC oracle: a probabilistic approximation to the equivalence oracle.<br>After \(N\) random tests without finding a counterexample, we declare<br>the hypothesis probably approximately correct.

Discrimination tree (DT): a binary tree whose inner nodes are<br>discriminator suffixes and whose leaves are states. Sifting a string<br>\(w\) through the tree classifies it to a state using one membership<br>query per level.

Access sequence \(acc(q)\): the shortest known string that reaches<br>state \(q\) in the target.

Spanning tree: a mapping from each known state to its access sequence.<br>A dict from state to the shortest string known to reach it.

Open transition: a transition from state \(q\) on symbol \(a\) whose<br>target state has no access sequence yet, meaning TTT has not yet<br>determined which state it leads to.

Counterexample decomposition: the process of finding the split point<br>in a counterexample, extracting a new discriminator, and splitting a<br>leaf in the DT.

Contents

Definitions

Prerequisites

The Road from L* to TTT

The DFA Representation

The Oracle

The Discrimination Tree<br>Sifting

The Spanning Tree

Hypothesis Construction<br>Incremental Hypothesis Update

Counterexample Decomposition<br>The Split Point

Prefix Transformation

Splitting a Leaf

Discriminator Finalization

Putting Decomposition Together

Non-Redundancy

A Note on the Equivalence Oracle

DT Coherence After Split

The Main Loop

Examples

Evaluating Model Accuracy

Comparison with L*

References

Artifacts

Important: Pyodide takes time to initialize.<br>Initialization completion is indicated by a red border around Run all button.

Run all

Prerequisites

We use the Teacher and Oracle from the L* post unchanged.<br>The PAC equivalence oracle in Teacher is a direct drop-in for TTT.<br>The rest of the algorithm is completely independent of how equivalence<br>queries are answered.

Available Packages

These are packages that refer either to my previous posts or to pure python<br>packages that I have compiled, and is available in the below locations. As<br>before, install them if you need to run the program directly on the machine.<br>To install, simply download the wheel file (`pkg.whl`) and install using<br>`pip install pkg.whl`.

simplefuzzer-0.0.1-py2.py3-none-any.whl from "The simplest grammar fuzzer in the world".

rxfuzzer-0.0.1-py2.py3-none-any.whl from "Fuzzing With Regular Expressions".

earleyparser-0.0.1-py2.py3-none-any.whl from "Earley Parser".

cfgrandomsample-0.0.1-py2.py3-none-any.whl from "Uniform Random Sampling of Strings from Context-Free Grammar".

cfgremoveepsilon-0.0.1-py2.py3-none-any.whl from "Remove Empty (Epsilon) Rules From a Context-Free Grammar.".

gatleastsinglefault-0.0.1-py2.py3-none-any.whl from "Specializing Context-Free Grammars for Inducing Faults".

hdd-0.0.1-py2.py3-none-any.whl...

from tree counterexample oracle state none

Related Articles