The Resistor Network: Thinking Different, Thinking Slowly: LLMs on a PowerPC Mac
Monday, March 24, 2025
Thinking Different, Thinking Slowly: LLMs on a PowerPC Mac
There is something incredibly satisfying about breathing new life into old hardware. Vintage computing is one of my favorite hobbies. The challenge of coaxing modern software onto systems designed decades ago is a puzzle I cannot resist. I have been diving into the world of large language models (LLMs), and a question began to gnaw at me: could I bring the cutting-edge of AI to the nostalgic glow of my trusty 2005 PowerBook G4? Armed with a 1.5GHz processor, a full gigabyte of RAM, and a limiting 32-bit address space, I embarked on an experiment that actually yielded results. I have successfully managed to achieve LLM inference on this classic piece of Apple history, proving that even yesteryear's hardware can have a taste of tomorrow's AI.
PowerBook G4 running TinyStories 110M Llama2 LLM inferenceI started by reviewing the llama2.c project from Andrej Karpathy. This brilliant project implements Llama2 LLM inference with just a single file of vanilla C. No accelerators here. Performance is traded for simplicity which makes it easy to understand how inference is carried out.
I forked the core implementation to a project that I have titled ullm. The core algorithm remains the same, but I spent time improving a few aspects of the code so that it would stand up to abuse a little better.
Code Improvements<br>I started with a few basic improvements. I also added wrappers for system functions like file I/O and memory allocations. This makes it easier for me to instrument the program.<br>Introduce a status return, remove all calls to exit<br>Abstract file access to simplify status handling<br>Abstract malloc/free for some simple debug/analysis<br>Replace the 512 byte static LUT for single character strings<br>Fix a few warnings when compiling with -Wall
Start at the Library<br>I made more large scale changes as I organized the code into a library with a public API that is exposed by a header. This enables unit-testing to ensure that further refactoring does not break inference functionality.<br>// The runtime config for the inference operation.<br>typedef struct {<br>// The prompt to generate a response to.<br>const char* prompt;
// The path to the checkpoint file.<br>const char* checkpoint_path;
// The path to the tokenizer file.<br>const char* tokenizer_path;
// Model configuration.<br>float temperature;<br>float topp;<br>unsigned int steps;
// The source of entropy.<br>uint64_t rng_seed;
// The callback and context for generated output.<br>void (*output_callback)(const char* token, void* cookie);<br>void* cookie;<br>} UllmLlama2RunConfig;
// The runtime state for the inference engine.<br>typedef struct {<br>UllmFileHandle checkpoint_file;<br>UllmLlama2Transformer transformer;<br>UllmLlama2Tokenizer tokenizer;<br>UllmLlama2Sampler sampler;<br>} UllmLlama2State;<br>The library expects two inputs: a const config which supplies details such as model paths, rng seed and token output callbacks, as well as state which keeps track of loaded weights, temporary buffers, tokenizer state and more.
The resulting API is exceedingly simple to test and easy to build command-line interface tools around.
Callback-Based Output & Testing<br>The migration to a public API lends itself well to replacing printf-based output with callbacks as tokens are produced by the inference engine. This was ultimately the final change necessary to enable integration testing of the end-to-end inference pipeline.
void OutputHandler(const char* token, void* cookie) {<br>std::string* test_output = static_cast(cookie);<br>test_output->append(token);
TEST(UllmLlama2, Stories15M) {<br>const std::string expected_test_output = R"(The birds chirp. Where do they go?<br>The birds flew around the sky, looking for something to do.<br>The birds saw a big tree and flew over to it.<br>The birds saw a big, red apple on the ground. It looked delicious.<br>The birds flew down and picked up the apple.<br>The birds flew back up to the tree and started to eat the apple.<br>The apples were so delicious!<br>The birds ate until they were full.<br>The birds flew away, happy and full.<br>)";
std::string test_output;<br>UllmLlama2RunConfig run_config;<br>UllmLlama2RunConfigInit(&run_config);<br>run_config.checkpoint_path = "ullm/tinystories15M.bin";<br>run_config.tokenizer_path = "ullm/tokenizer.bin";<br>run_config.prompt = "The birds chirp. Where do they go?";<br>run_config.output_callback = OutputHandler;<br>run_config.cookie = &test_output;
UllmLlama2State state;<br>UllmStatus status = UllmLlama2Init(&run_config, &state);<br>ASSERT_EQ(status, ULLM_STATUS_OK);<br>status = UllmLlama2Generate(&run_config, &state);<br>EXPECT_EQ(status, ULLM_STATUS_OK);<br>UllmLlama2Deinit(&state);
EXPECT_EQ(test_output, expected_test_output);<br>}Internals<br>Beyond the public API, I also reorganized the internals. After making improvements, the code becomes rather elegant and removes all calls to exit in favor of status propagation when any aspect of initialization or...