Heterogeneous Pythonic language in your pocket

Heterogeneous Pythonic language in your pocket | by Amr Hesham | Jun, 2026 | MediumSitemapOpen in appSign up Sign in

Medium Logo

Get app Write

Amr Hesham

8 min read· Just now

Listen

Press enter or click to view image in full size

Hello everyone 👋, I am Amr Hesham, and I am very passionate about Compilers, Languages, and building cool projects. Four years ago, when I was working as a full-time junior Android Developer, I used to spend my time after work reading Compiler books 🐉 and contributing to open source projects. When I was playing with the Python Turtle graphics library, I found it very interesting, and I started to think, why not mix Android and Compilers 🤔, what if I can build a small language and re-implement the turtle library in this language and deploy it in an Android App? The result of that was Turtle 🐢 . The project was very interesting to implement, and 4 years later, I was surprised that the app had 35K downloads with good ratings and reviews on Google Play .

Turtle 2022 versionBy the end of 2025 and the start of 2026, I started to read Programming Massively Parallel Processors , CPython Internals , and other books to learn more about GPU architecture and programming, compilers, and runtime systems. Then I started to think 🤔, Mmmmm, a modern smartphone has a GPU already, can’t we create a language that allows users to perform computation on that GPU 🤯, can’t we create a system that, at runtime, can compile a function node in this language into GPU code and launch it, then read the result back to the interpreter or VM 🤔, so think of it as a JIT for GPU. At that time (Around March of 2026), I created an expermintal toy project that compiles a simple function into a WebGPU shader and launches it on Android Device, and guess what, it works 🥳, then I decided, okay, let's first create a language that gives the current users the same features that were in the previous version, Then we can support GPU, but what will be the syntax of this language 🤔? should i extend the current language that i built in 2022 or should i rebuild a new language (Which is something i love to do so much 😉)?. The Lilo Programming Language I started to think, okay, first, who are the target users? They want to practice or play with features and turtle graphics drawing on a mobile phone, and they want a language that they already know or is easy to use. Mmmmm, which language do we know that it’s easy to use and most people love it 🤔, yes, Python 🐍, also I was already reading the CPython Internal book and reading Mojo 🔥 stdlib source code, which is so interesting to read. I spent some time reading more about Python, CPython internals from the maintainer's talks, and also from Chris Lattner's talks, to understand how Python has only one data type, which is PyObject. I started an empty project to implement a subset of Python from the official standard, with the same dynamic features and API’s, for example, magic methods, stdlib modules such as List, Dict, Map, Reflections, other commons libraries, and of course Turtle 🐢, and here is the result. Press enter or click to view image in full size

The journey was very interesting for me. I implemented most of the features in Python, learned a lot about the language and VM implementation of CPython, and then I started to think it’s time to switch from continuing to implement Python features to working on the GPU programming part 🫣. Add support for GPU programming At this point, we have a nice Pythonic language with Turtle module that can run python samples and draw the same shapes 🥳, but now it’s time to support compiling and launching kernals on the GPU, and to update the value back then continue the interpreter execution, which allow for example to perform Vector or Matrix multiplications on the GPU and use the result with the Turtle API to draw nice graphics. Note 🪧: At that point, Lilo has the same syntax and semantics implemented from the Python grammar reference, but I added one extra keyword out To make it easy to know which argument we should copy the value from the Device to the Host after finishing executing the kernel, we can remove it and sync all arguments back, or perform a simple analysis and mark the output parameter in the AST, and it’s possible, but I just added it for now in the beta version, and maybe removed soon. Before showing the syntax in Lilo and how I implemented that, let me show you a quick example in CUDA-C (If you know Mojo or Trition, Lilo is also inspired by them :D) to make it easy to map the concepts from CUDA-C to what I implemented. __global__ void vector_add(float *A, float *B, float *C, int N) { int i = blockIdx.x * blockDim.x + threadIdx.x; C[i] = A[i] + B[i];

int threadsPerBlock = Dim3(...); int blocksPerGrid = Dim3(...); vector_add>>(a, b, c, N);For this article, I will not explain everything from scratch. I can’t recommend more to read...

Heterogeneous Pythonic language in your pocket

Related Articles

The Newest Instagram "Exploit" Is the Goofiest I've Seen

Apple WWDC 2026 Livestream

Claude Fable 5

It's Not Just X. It's Y

Show HN: GoPeek – open links in live mini browser windows without new tabs