Large Language Models in Software Security Analysis

Large Language Models in Software Security Analysis – Communications of the ACM

Latest Issue

Join ACM

A Cyber Reasoning System

Opportunities for LLMs in Software Analysis

Challenges for LLMs in CRSs

Related Work

Threats to Validity

Perspectives

Acknowledgments

References

Footnotes

Software systems control many key parts of society, including government agencies, medical services, utilities, and national defense infrastructure. Protecting these systems, however, is challenging due to their diversity and scale. Modern software is composed of components written by different people, organizations, and, increasingly, generative AI tools. The resulting systems are composed of many programming styles, languages, deployment environments, and dependencies. This complexity not only leads to additional bugs and vulnerabilities, but also increases the difficulty of analyzing and reasoning about these systems. Our software is often so large that it simply cannot be manually audited effectively, and simultaneously so diverse and convoluted that building automated analysis tools presents serious practical challenges. As a result, even the tools developed to mitigate security issues in our software infrastructure are not exempt from the perils of growing complexity. This was made abundantly clear by the recent CrowdStrike incident,a where the security system itself had a bug that caused worldwide outages across countless industrial sectors.<br>Key Insights<br>Cyber reasoning systems can address the growing challenge of securing our collective software infrastructure by autonomously detecting and remediating vulnerabilities. LLMs are a critical ingredient in building these systems.

LLMs’ ability to infer developer intent, generalize to different software domains or environments, and iteratively reason in agentic loops creates promising opportunities for leveraging LLMs to augment or replace existing program-analysis techniques.

Many promising approaches have been proposed to improve aspects of software security, but have proven insufficient in addressing the vulnerability of our collective software infrastructure in isolation. For example, fuzz testing (fuzzing) techniques19—that is, a biased random search over the domain of program inputs to detect vulnerabilities—have proven to be effective in real-world applications and at scale.23 Yet the impact of discovering new vulnerabilities through fuzzing is greatly limited by the effectiveness of the engineers who must manually triage and fix them. Indeed, there is a dire shortage of cybersecurity professionals, and a detected vulnerability may only be fixed in downstream, dependent software 90–150 days after it is reported.9 Thus, to ensure our software is truly secure, a holistic approach is needed that can automatically progress through all stages of the cybersecurity pipeline. How can we discover and remediate vulnerabilities in complex software, prior to them being encountered in the field? What opportunities do recent advances in large language models (LLMs) present for achieving this goal?<br>In this article, we articulate in broad brushes how LLMs can contribute to building so-called cyber reasoning systems (CRSs), which attempt to address these issues autonomously. We first outline the broad structure of a consolidated CRS, along with examples of the role that LLMs can play in various components of this system. We then summarize the high-level technical challenges and opportunities in this emerging research area. The article is partly inspired by the recently completed DARPA AI Cyber Challenge,b from which we take the term cyber reasoning system, and partly by the recent public interest in autonomous software engineering, where programs are fixed and improved automatically10 using LLM agents such as AutoCodeRover.22,28 It is also informed by our research over the past decade in automated program repair and vulnerability discovery using a variety of analysis techniques, including search,4,8 symbolic reasoning,20 and LLMs.18,28

A Cyber Reasoning System<br>A cyber reasoning system (CRS) is a software system that can both detect and repair software vulnerabilities autonomously in a given system under test (SUT). Ideally, a CRS supports a wide range of real-world software systems, including those written in different programming languages and/or containing millions of lines of code. We note that substantial prior research has been conducted on the various subgoals of a CRS.8,17,25 Thus, the core challenge of building a CRS lies in overall flexibility, scale, and the careful combination of disparate techniques into a coherent, effective system. Despite decades of research and industry interest, building such a CRS using only traditional program-analysis tools has proven to be impractical to the point of impossibility. The recent emergence of large language models has made the realization of a CRS possible, via a broad new design space...

Large Language Models in Software Security Analysis

Related Articles

Amazon, Facebook, FBI have access to a private intelligence-sharing network

Show HN: GoPeek – open links in live mini browser windows without new tabs

Agent Memory: An Anatomy

SpaceX not the behemoth everyone thought

The Mirror Is Part of the Machine