Metaprogramming for madmen | The ryg blog
Skip to content
Follow:<br>RSS<br>Twitter
The ryg blog
When I grow up I'll be an inventor.
Home
About
Coding
Compression
Computer Architecture
Demoscene
Graphics Pipeline
Maths
Multimedia
Networking
Papers
Stories
Thoughts
Uncategorized
Metaprogramming for madmen
April 8, 2012
This post is part of the series "Debris: Opening the box".
Okay, previous posts from this series aimed to actually convey useful technical information: assuming you actually want to write say a mesh generator, there’s definitely useful bits there.
This is not one of these posts. In fact, this is about what is pretty much the complete opposite: an insane ploy that, against all odds, actually kinda worked, then spectacularly backfired. We knew ahead of time this was gonna happen, but we were desperate. I don’t think there’s anything of any use we actually learned from the whole thing – but it makes for a cool story (in a very nerdy way), so why not? Consider it a (seasonally appropriate!) easter egg.
The setting
This is recounting a true story that took place in late March and early April 2004 – a bit less than 2 weeks before Breakpoint 2004, where we planned to release our 96k first-person shooter ".kkrieger". While being a nice technical challenge, it was a complete failure as far as the game side was concerned (mostly because no-one involved really deeply cared about that aspect). That’s not what this post is about though – the point is that we very nearly missed the 96k limit too.
The way any self-respecting 4k intro, 64k intro or other size-limited program starts out is, well, too large. I haven’t done enough 4ks to give you an order of magnitude estimate, but most of our serious 64ks were somewhere around 70-80k a few weeks before they were supposed to be released. The last 2 weeks usually ended up turning into a frenetic rush to simultaneously get the thing actually finished (put all the content in etc.) and make it fit within the size limit. Doing either of those is a real challenge by itself. Doing both at once is both extremely stressful and mentally exhausting, and by the end of it, you’re basically in need of a vacation.
Anyway, there’s (broadly speaking) 3 different steps to making sure you get small code:
Architecture. Basically, design the code in a way that enables it to be small. Know exactly what goes into it, keep it modular, and make sure the algorithms are appropriate. Store data in the right way, and work with your back-end packer, not against it. All of this is pretty much over 2 weeks before release though – if you didn’t get that part right early enough, you’re not gonna be able to fix it in time.
Evicting ballast. There’s gonna be whole paths of code that are just not getting used. If your data doesn’t have a single torus in it, there’s no point including the code that generates it – and if there’s just one torus, you can maybe replace it with something else in the art. You get the idea. This step is fairly easy provided you have at least rudimentary stats on your content, and it yields big wins in short time (in a 64k context, usually several kilobytes for less than an hour of work).
Detail work. This is where it gets messy. It basically boils down to dumping stats about all of the functions in the code, figuring out which are larger than they should be and what you can do to make them smaller. Lots of staring at disassembly listings trying to find out where the compiler is generating big code and why. Also lots of time staring at the content and figuring out if there’s some shortcuts you can take to make this particular intro smaller. This is where you start switching to a different branch (or at the very least wrap all of your changes in #ifdef) because you’re going to introduce bugs and by the end of it, parts of the code are gonna be ruined. It’s also slow work – by the point the initial easy targets are down, and assuming there’s nothing else to do, I can get 32-bit x86 code (in a 64k intro context) smaller at a rate of about 300-400 bytes per day, tops.
Basically, by the end of step 2, you have a pretty good idea of how much more work is left, even though you have very little clue of what that work is :). Note this is very different from things like speed optimization. For speed optimization, you’re usually willing to make significant changes to data flow to do less work, at least where it matters, and you concentrate on hot spots. For size optimization, hot spots are mostly irrelevant; sure there may be a place where a big array ends up being initialized by code rather than being stored as data, and fixing that kind of thing is usually easy and fun, but mostly the hot spots in a "size profile" are the places where the actual work gets done. Unless you actually do work that is completely unnecessary, optimizing for speed will not make your code smaller in an absolute sense, it will just move the complexity (and thus size) somewhere else...