Artificial Intelligence for Software Engineering: From Probable to Provable

Artificial Intelligence for Software Engineering: From Probable to Provable – Communications of the ACM

Latest Issue

Join ACM

‘Prompt Engineering’ Is Requirements Engineering

Accepting a Certain Percentage of Errors

Software and Its Correctness

Software Is Special

When Correctness Is Critical, and When Not

Errors Can Snowball

The Hippie and the Disciplinarian

Toward a Workable Process

References

Footnotes

Here we go again: No programmers will be needed anymore! AI will generate the code! If you have been around for a while, you may feel a sense of déjà vu. That same line advertised COBOL in the 1960s, 4GLs in the 1970s, CASE tools in the 1980s, component-based development in the 1990s, model-driven architecture in the 2000s, and low-code/no-code in the 2010s. Some of these approaches did improve programming, but they did not replace programming, let alone programmers. They simply introduced higher levels of abstraction or new tools, sometimes taking advantage of a restricted application domain. Is it the same this time, or do artificial intelligence (AI) and vibe coding upend the game? More generally, can AI and software engineering enter into a successful marriage? Warning and spoiler alert : Even though the following discussion starts out by examining limitations of AI for software construction, do not just expect a critique. Its aim is positive, in support of AI-supported software engineering. Its core thesis (here I am really spilling the beans) is that a successful solution requires combining AI with formal verification. (End of spoiler .)

‘Prompt Engineering’ Is Requirements Engineering Are we about to witness the end of programming? The typical grand pronouncement is something like: “You will just state what you need: AI will generate the code for you.” The adverb, “just,” is epic. Stating what we need (“just” what we need) is requirements engineering (RE), among the most difficult parts of software engineering (I devoted a recent book to it7). Anyone who has practiced RE knows that it does not differ that much from programming. It is not exactly the same thing, since it ignores algorithms and the implementation of data structures, but shares many challenges and techniques with programming, particularly concerns of abstraction, structuring, componentization, and refinement. Tellingly, many specification languages resemble programming languages, sans the implementation part. The difficulty of requirements comes in part from the need to specify precise system behavior. Usually, requirements do not specify all the details: we are happy to give enough information to enable developers to get started. We also accept slightly incorrect requirements: we are happy to assume that programmers will apply common sense. Even if requirements are initially correct, they often become incorrect after a while because the code diverges from them. Seamless development, discussed in the book cited above,7 strives at all times to maintain consistency between all artifacts of software development—requirements, code, designs, tests—but implies a special software process. If we do use requirements to generate the code through AI, good-enough requirements may no longer be good enough. After all, the code, if correctly generated, will do what we tell the AI tools it should do, rightly or wrongly. Vibe coding advocates will dismiss these concerns as naysaying. AI code generation is indeed a reality. Major tech CEOs are on record with statements on how much of their companies’ code already comes to life that way, and how much more will in the future. I, too, have been swept away by the impressive results that one gets, initially. When trying, for example, to use some existing API: instead of learning it through often haphazard documentation, you can feed your desired scenario to an AI tool and let it figure out how to use the API to realize it.

Accepting a Certain Percentage of Errors Problems arise when you move on from experiments, however breathtaking, to real problems. Often, the result somehow looks right, but is not. The problem is that software differs in essential ways from many showcase areas of AI application. In medical analysis, a tool can be OK if it produces the right results most of the time: the alternative—the best human experts—also produces an occasional false negative or positive. If the tool is wrong less often, it wins! That is why Modern-AI has already produced a revolution in image analysis. Another revolutionized domain is human-language translation. Until two decades ago, if you came across a document in a language you cannot even decipher (say Korean for me), you were stuck. Today you get a translation, often very good, in seconds. It might still contain a few mistakes, but for a non-speaker it handily beats the alternative (understanding nothing). Even a professional translator can benefit, by running the tool to get a rough version then using his...

Artificial Intelligence for Software Engineering: From Probable to Provable

Related Articles

Amazon, Facebook, FBI have access to a private intelligence-sharing network

Show HN: GoPeek – open links in live mini browser windows without new tabs

Agent Memory: An Anatomy

SpaceX not the behemoth everyone thought

Naphtha Shortages Having a Growing Impact in Japan