Static Devirtualization of Themida<br>search icon
to navigate
to select
ESC to close
Static Devirtualization of Themida<br>IDontCode<br>naci
Windows
May 9, 2026<br>Table of ContentsIntroduction<br>Before reading this article I highly recommend studying the following community research on binary deobfuscation.<br>https://arxiv.org/pdf/1909.01752<br>https://github.com/Colton1skees/Dna/pull/8<br>https://github.com/JonathanSalwan/VMProtect-devirtualization<br>https://github.com/NaC-L/Mergen<br>https://www.youtube.com/watch?v=3LtwqJM3Qjg<br>https://github.com/backengineering/vmp2<br>https://back.engineering/blog/17/05/2021/<br>https://www.youtube.com/watch?v=vYAJCfafYTY<br>https://www.youtube.com/watch?v=KYQOtGiH9pQ<br>https://github.com/r3bb1t/bin_lift<br>https://nac-l.github.io/2025/01/25/lifting_0.html<br>https://blog.thalium.re/posts/llvm-powered-devirtualization/<br>https://github.com/avast/retdec<br>https://github.com/ergrelet/themida-unmutate<br>https://github.com/lifting-bits/remill<br>This article demonstrates devirtualization of CodeVirtualizer/Themida protected code, however the techniques described here apply to pretty much every virtual machine based obfuscator. Only requiring some minor modifications to support each of them. The following is a non-exhaustive list of obfuscators that can be reduced using the technique described in this article.<br>https://vmpsoft.com/<br>https://www.oreans.com/themida.php<br>https://github.com/vxlang/vxlang-page<br>https://github.com/snowsnowsnows/EagleVM<br>https://github.com/dmaivel/covirt<br>https://github.com/noahware/binprotect<br>Themida Architecture Analysis<br>Themida’s virtual machine architecture differs from VMProtect primarily in its support for nested virtualization. This is made possible by the fact that the VM context and virtual stack live inside the binary itself rather than on the native stack as they do in VMProtect. This article will not go deep on the architecture since it is largely not relevant to the devirtualization approach. The only VM-specific components that matter here are virtual branching and VMEXIT behavior, both of which are covered in their own sections. For a thorough breakdown of the Themida architecture, see this research.<br>Warning To The Wise<br>Pattern matching VM handlers back to x86 instructions is not an approach I recommend. I have tried it, and it does not scale. Any small change the protector vendor makes to handler layout, opcode tables, or dispatch logic can silently break your tooling across an entire version range. The more your implementation depends on VM-specific behavior, the more fragile it becomes.<br>The approach presented in this article deliberately minimizes VM-specific knowledge. That is what makes it work across a wide range of Themida versions. That said, studying the VM architecture is still worthwhile, not to pattern match against it, but to orient yourself within it and make informed decisions about how to guide the symbolic evaluation engine.<br>The vast majority of devirtualization work is done by a handful of general optimizations. VM-specific knowledge only becomes necessary when dealing with control flow, specifically virtual branching and virtualized calls.<br>Guided Symbolic Evaluation<br>The core idea is to lift native instructions into a malleable intermediate representation and drive the lifting process forward by concretizing control flow as optimizations resolve unknown branch destinations. Back Engineering Labs maintains its own binary lifting and recompilation engine for this purpose called BLARE2. It sports a custom SSA IR with support for AMD64 and ARM64, along with a full pass system, optimizer, instruction selector, register allocator, and linker. That last part is what separates it from most lifting frameworks: BLARE2 can lower optimized IR back to native code and reinsert it into the binary, producing output that is near 1:1 with the original. Anyone looking to follow the techniques in this article can get most of the way there with Triton or an LLVM-based lifter like Remill. Both are capable of producing clean optimized IR. The gap is on the backend: getting LLVM to emit tight, well-behaved native code that reinserts cleanly.<br>Lifting starts with all registers and flags symbolic. From there, instructions are disassembled and lifted until the next instruction pointer cannot be determined. What happens next depends on the control flow instruction. A lifted ret means the last store to RSP is the next IP. When an address genuinely cannot be concretized, it means one of two things: either the optimizations have not run far enough, or the branch has multiple real destinations, as is the case with a virtualized JCC.
Concretizing Stack Pointer<br>At the start of symbolic evaluation, all registers and flags are symbolic except for the stack pointer, which is given a concrete initial value. This is a deliberate design choice rather than a strict requirement. Keeping RSP concrete means the existing load/store propagation machinery handles stack accesses automatically, and any...