Autoresearch, Claude and Constrained Optimization

Introduction You don't need to look far to find claims that folks have been using AI to do the work of dozens of people. I tend to be skeptical of any claim that discusses improvements without evidence. I decided to take that skepticism and put it to work. This had a minor overlap with the whole 'loops' discussion on X but that's coincidental. Over the last few weeks I have put together a project in the theme of Kaparthay's 'Autoresearch'. I wanted to choose a problem that was not a traditional machine learning or numerical optimization problem but one that still had some objective measure of success. I chose this kind of problem because many of the projects or products I have worked on are structured that way. You have some metric that you want to change (up or down) and ideally some way to measure it. You likely also have some constraints e.g. we can't let the page load time exceed 500ms for this feature. I have yet to work on a problem like this where the path from unknown to success is a clear, gradient optimization akin to machine learning. More often you complete some work, test it in the 'real world', look at how it performed and then make a decision about next steps. Not all changes result in a positive outcome and it's easy to go deep down a path that results in a locally optimal outcome. I wanted an experiment that would give me some intuition about how to task AI agents with bigger pieces of work in a mostly unsupervised way. There are already other mechanisms to try and achieve this outcome, such as Ralph Loops and the /goal command that's now in Claude Code. The difference in this setup is that I would pick a quantifiable number as the primary measure of success and bound the problem with some pass-fail constraints. Not wanting to over complicate things I chose the problem of file compression. I picked it because the objective and the constraints were simple. A compression algorithm is better if the final file size is smaller. I added two constraints to the problem, one being that the uncompressed file needed to match perfectly and the other that neither compression or decompression could exceed 300 seconds. I was deliberately not optimizing for speed but wanted to cap the time and ensure the process could run mostly unsupervised with the knowledge that a timeout would catch and infinite loops. The other nice thing about file compression is that there are many existing tools I could use for a final benchmark. Given this was a small proof of concept I wasn't expecting to create a new top-of-the-line algorithm. Despite that, knowing how well this home cooked version performed against existing tools also helps provide a data point on how much we might move away from libraries and off the shelf solutions. If an agent can quickly and reliably solve a problem previously solved by an external dependency there must be some point at which the value of an in house solution exceeds the risk of things like supply chain attacks. This isn't something one single experiment would answer but it would help determine if this was worth looking at more. Methodology Problem Setup First, a reminder that the goal here was to see if this approach was viable rather than to benchmark any particular model. Second, before we get into it, all the code for this project is available here: https://github.com/smitec/agent-compression For this work I used Claude Code with default settings on Sonnet 4.6. I am certain different models would have done things differently, that's an exercise for another day. Prior to any agent involvement I setup a basic scaffold for the project. I picked Rust because some of the implicit constraints like "don't modify the function signature" were easily enforceable via the type system. I put together a stub of the compress and decompress function which both just copied the bytes across. This 'worked' but provided zero compression to any of the data. I then put in place a couple of basic unit tests to test the compress-decompress round trip on both a string and a simple file. These tests weren't exhaustive but did validate that the compress and decompress function were adhering to their goal of a bit perfect round trip. From there I put together a bench-marking script. This script fetched some public domain file samples across video, audio and text as well as created some files filled with random data of various sizes. Many of these files were in formats that were already somewhat compressed so I added a step to convert them to less compressed formats. This gives a good file wise benchmark alongside the overall compression benchmarks. Having this sample set meant that there were a mix of high and low entropy file formats. A good compression algorithm will shrink low entropy formats and leave high entropy formats mostly unchanged. You can expect some minor change in file size due to format specific bytes but overall you don't...

Autoresearch, Claude and Constrained Optimization

Related Articles

(no title)

Is AI ruining our skills? Early results are in – and they're not good

The Anatomy of an AI-Native Org

Apertus – Open Foundation Model for Sovereign AI

The labor share of income in the US is at its lowest post-war level