GitHub - infobyte/isa_recovery: ISA Recovery · GitHub
/" data-turbo-transient="true" />
Skip to content
Search or jump to...
Search code, repositories, users, issues, pull requests...
-->
Search
Clear
Search syntax tips
Provide feedback
--><br>We read every piece of feedback, and take your input very seriously.
Include my email address so I can be contacted
Cancel
Submit feedback
Saved searches
Use saved searches to filter your results more quickly
-->
Name
Query
To see all available qualifiers, see our documentation.
Cancel
Create saved search
Sign in
/;ref_cta:Sign up;ref_loc:header logged out"}"<br>Sign up
Appearance settings
Resetting focus
You signed in with another tab or window. Reload to refresh your session.<br>You signed out in another tab or window. Reload to refresh your session.<br>You switched accounts on another tab or window. Reload to refresh your session.
Dismiss alert
{{ message }}
infobyte
isa_recovery
Public
Notifications<br>You must be signed in to change notification settings
Fork
Star
main
BranchesTags
Go to file
CodeOpen more actions menu
Folders and files<br>NameNameLast commit message<br>Last commit date<br>Latest commit
History<br>1 Commit<br>1 Commit
docker
docker
integration_tests
integration_tests
isa_recovery
isa_recovery
scripts
scripts
wiki
wiki
README.md
README.md
config.example.yaml
config.example.yaml
pyproject.toml
pyproject.toml
View all files
Repository files navigation
ISA Recovery System
A reverse-engineering pipeline that turns a firmware binary and its (possibly-wrong) disassembly into a working Ghidra processor specification. When you hit a proprietary processor with no documentation and no Ghidra support, this tool recovers the real encoding of each instruction — which bits are the opcode, which are registers, which are immediates — and writes out a SLEIGH spec you can load directly into Ghidra to decompile the firmware.
Under the hood it is an agentic workflow : a fixed pipeline where each step is a large language model prompted for a narrow job. The workflow is orchestrated by deterministic code — not by the LLMs themselves — and every SLEIGH constructor generated at the end is verified by compiling it with Ghidra's sleigh binary before being accepted. Failed compilations are fed back to the model for up to three repair attempts.
How It Works
Objdump<br>Bootstrap ─── deterministic clustering (no LLM)<br>┌─ Processing Loop ──────────────────────────┐<br>│ Text Interpreter → Bit Interpreter ──┐ │<br>│ → Knowledge Manager │ │<br>│ → Supervisor │ │<br>│ │ split ─────┘ │<br>│ └── next cluster ──────────┤<br>└────────────────────────────────────────────┘<br>Knowledge Base<br>SLEIGH Generator ─── compile-verify-retry loop<br>Ghidra .slaspec
Instructions are grouped into clusters by structure (byte size, token pattern, fixed-bit mask). Each cluster is then analyzed by a chain of specialized LLM steps:
Text Interpreter extracts the text pattern (add {REG1}, {REG2}, {REG3}).
Bit Interpreter maps each placeholder to a bit range using field-correlation tools; can request a split if a cluster mixes encodings.
Knowledge Manager integrates per-cluster evidence into a typed knowledge base of registers, instructions, addressing modes, and architecture traits.
Supervisor is primarily a deterministic gatekeeper (structural checks on match rates, unmapped placeholders, opcode overlap). It only invokes an LLM when a check fails, and it can either accept, re-run a specific agent with feedback, or escalate to the human via the TUI.
When the knowledge base is complete, a separate SLEIGH generator builds the Ghidra spec in two phases: a deterministic skeleton of all constructors marked unimpl, then an LLM fills in the p-code semantics one instruction at a time, compiling each against Ghidra's sleigh binary and retrying on failure.
Designed as a co-pilot for the analyst, not a replacement : the TUI exposes every decision, the supervisor escalates ambiguous clusters to a human, and the full LLM conversation, tool-call, and token-usage history is written to disk.
Tested on LEGv8, MIPS, pi32v2, and x86.
Quick Start
.env<br>./docker/run.sh integration_tests/mips
# Local<br>pip install -e ".[all]"<br>python -m main --config config.yaml"># Docker (recommended)<br>echo "ANTHROPIC_API_KEY=sk-ant-..." > .env<br>./docker/run.sh integration_tests/mips
# Local<br>pip install -e ".[all]"<br>python -m main --config config.yaml
What You Need (and What You Get)
Input : a firmware binary and an objdump disassembly — even one produced against the wrong architecture. The tool does not solve the disassembly problem itself; output quality scales with input disassembly quality.
Output : a Ghidra .slaspec file plus a JSON knowledge base of registers, instruction encodings, addressing modes, and architecture traits.
Documentation
Full documentation — architecture, agent internals, worked examples, configuration reference — lives in the wiki:
pip install -e ".[docs]"<br>cd wiki && mkdocs serve
Then open...