Tell your LLM how to check its work for empowered AI coding

Tell your LLM to verify its changes.

Khalah Jones Golden

SubscribeSign in

Tell your LLM to verify its changes. You wouldn't submit your own work without checking it, why do anything different with AI? Some techniques and thought patterns to help you work better with LLM's. Khalah Jones Golden Jun 17, 2026

If you’re a Software Engineer, or adjacent, and are struggling to use AI productively, this is for you. By the end of this, you will have a new tool necessary to partner with AI to write great code. That technique is having a way for the AI to test and or verify whether or not its changes worked properly, completely and end to end.

AI is hugely productive, truly, it is an amazing tool, but like most new tools learning how to use it effectively is the crux of the matter. Just like you can hold a hammer wrong, using the ai incorrectly can lead to less than desirable results. But AI was trained using a lot of context, and with someone or something verifying its work, and we in fact as engineers also have a whole infrastructure around checking if our own work is functioning properly. From Down checkers on the high level, to integration tests on the other end, our infrastructure has already been built to account for common problems between the keyboard and chair. We can use these same techniques and tools in order to make sure AI produces code that works. Thanks for reading! Subscribe for free to receive new posts and support my work.

The single greatest technique that has given me the greatest gains is giving the AI a way to objectively check if its changes worked properly. This can take many forms, but for webapps, for instance, have the AI spin up an actual browser and check to see if the changes did what they were supposed to. If you do not have a way to get the AI to test its changes you are almost guaranteeing your changes won’t be implemented correctly. The gains from simply giving AI a way to test your changes cannot be understated. For example the other day I was trying to get Airplay to work on an app I was building, airplay requires https, and though I had https on the live server my local clone didn’t. I was initially just prompting and planning and then pushing it up and checking to see if the server worked properly, the airplay icon was there but clicking on it, and choosing the device didn’t start the actual playback. So I slowed down, instead instructed the ai to set up https locally, and use safari with devices on the local network, to make sure it worked. Only after finally giving Claude this testing environment was I able to actually get the airplay to work, having never ever done airplay myself. As another example I had a design mockup that was just an image and I needed to implement it with AI. I figured I could just use ImageMagick to compare a screenshot of the implementation, and check how similar it is to the design, telling the AI to make it so it is above 95% similar with only 5% blur applied to the images. With this I was able to 1 shot AI to get it to implement that design for the pages just doing this 1 by 1 for each page in the design. This particular implementation would’ve taken me 3 days, I was able with AI to get it done in 2 hours.

There is a lot of anecdotal evidence out there too that this works well even with other people, There are numerous examples of people using AI to rewrite a codebase, and one of the ways I have seen it done is, they copy all of the tests from the original repo, and then they check and make sure that their new code passes all of the tests. This allows the AI to confirm its own work and gives you an objective meter to how good the code actually works. I have even seen AI researchers use a lot of the same techniques (End of the 2nd paragraph under “Depthfirst’s Security Agent”), Over and over again, I have seen how giving the AI the proper way to test it’s changes is the only way to get it to properly write code. And when you really break down how our job used to go, I would never ever present any work without properly going through and thoroughly checking it end to end, because as familiar as I am with the code, all it takes is 1 character off in the right place to where it won’t work. Why would AI which is essentially just copying usnot need to have at least this, and more? If you don’t give it a way to confirm its changes, the AI is essentially going in blind and assuming that its changes work properly, most of the time confidently too. Luckily for us it’s actually pretty easy to figure out how to give the AI a testing scenario, simply ask yourself what you would do in order to check whether or not those changes worked properly, and tell the AI to do that. It is also imperative that you check things how the end user will interact with it, If you’re building an API, tell the AI to separately just use curl/wget to make requests, if you’re building a web app use playwright, if you’re building a game for mobile be sure to have...

Tell your LLM how to check its work for empowered AI coding

Related Articles

The Newest Instagram "Exploit" Is the Goofiest I've Seen

Apple WWDC 2026 Livestream

Claude Fable 5

US Government directive to suspend access to Fable 5 and Mythos 5

German ruling declares Google liable for false answers in AI Overviews