Analyzing Bytes: Pre-Disassembly Static Binary Analysis

matt_d1 pts0 comments

Analyzing Bytes: Pre-Disassembly Static Binary Analysis

Skip to main content

Google

Research

Search

Analyzing Bytes: Pre-Disassembly Static Binary Analysis

Huan Nguyen

Soumyakant Priyadarshan

ChenCheng Jiang

R. Sekar

Proceedings of the ACM on Programming Languages, Association for Computing Machinery (2026), pp. 1127-1151

Download

Google Scholar

Copy Bibtex

Abstract

Binary code analysis plays a central role in numerous applications in software security, performance optimization, reverse engineering, and so on. Existing techniques need to first disassemble binaries into functions in assembly code before an analysis can be performed. However, disassembly and function identification have proven to be major challenges for complex variable-length instruction sets such as the x86. A recent trend has been to use static analysis to improve the accuracy of these tasks. This raises a chicken-and-egg problem: a disassembly is needed for static analysis, but a static analysis is needed for accurate disassembly! We overcome this problem by developing a novel static analysis approach that can operate before committing to a disassembly. Our analysis operates on the output of exhaustive disassembly that considers each possible offset in a binary as an instruction, and constructs what is known as a super-set control-flow graph (CFG). The central technical challenge in analyzing this CFG is that it mixes legitimate instructions with unintended ones, causing analysis results from invalid code paths to pollute legitimate ones. To overcome this challenge, we begin with a key new insight that if we focus on backward analyses, we can ensure accuracy of analysis results at intended instructions even though we have no idea where these intended instructions are! Moreover, our analysis operates in time that is linear in the size of the binary. Specifically, in O(n) total time, it yields analysis results for every one of the n offsets in an n-byte binary. For this task, it is orders of magnitude faster than previous techniques, as the previous techniques typically need to repeat the analysis many times.

Meet the teams driving innovation

Our teams advance the state of the art through research, systems engineering, and collaboration across Google.

See our teams

Explore our other initiatives

Google AI

Discover how Google AI is committed to enriching knowledge and solving complex challenges

Products

Build

Research

Responsibility

Societal Impact

About

Google Cloud

High-performance infrastructure for cloud computing, data analytics & machine learning

Overview

Solutions

Products

Pricing

Resources

Google DeepMind

Our mission is to build AI responsibly to benefit humanity

Models

Research

Science

About

Google Labs

Explore the future of AI responsibly with Google Labs

About

Experiments

Stay connected

About Google

Google Products

Privacy

Terms

Cookies management controls

×

analysis google disassembly static binary analyzing

Related Articles