Parallelizing Arbitrary Python Code by Running 1M Python Interpreters on a GPU

GitHub - jndean/gpusnek: GPU-Parallelizing Arbitrary Python Code By Running 1 Million Python Interpreters on a GPU 🐍 · GitHub

/" data-turbo-transient="true" />

Search or jump to...

Search code, repositories, users, issues, pull requests...

-->

Clear

Search syntax tips

Provide feedback

--> We read every piece of feedback, and take your input very seriously.

Include my email address so I can be contacted

Cancel

Submit feedback

Saved searches

Use saved searches to filter your results more quickly

-->

Name

Query

To see all available qualifiers, see our documentation.

Cancel

Create saved search

/;ref_cta:Sign up;ref_loc:header logged out"}" Sign up

Appearance settings

Resetting focus

You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session. You switched accounts on another tab or window. Reload to refresh your session.

Dismiss alert

jndean

gpusnek

Public

Notifications You must be signed in to change notification settings

Fork

Star

master

BranchesTags

Go to file

CodeOpen more actions menu

Folders and files NameNameLast commit message Last commit date Latest commit

History 48 Commits 48 Commits

gpusnek

.gitignore

Makefile

README.md

example_allreduce.cu

example_repl.cu

example_sum_for_profiling.cu

example_test.cu

gpusnek_whitepaper.pdf

logo.png

utils_for_examples.h

View all files

Repository files navigation

Read the "whitepaper" here.

gpusnek answers the question "What would it look like to be able to inline arbitrary Python code into your high-performance CUDA kernels, with no consideration for why that is a bad idea?".

This repository implements a full Python interpreter that can run on one GPU thread (or in parallel on many). It even includes the Python lexer, parser and bytecode compiler.

We take the source code from MicroPython, ram it through nvcc (NVIDIA's CUDA compiler), and fix most of the things which break.

Examples include:

Running 1 Million Python interpreters on a consumer GPU and using them in an interactive REPL.

Communicating between CUDA threads by using Python to read/write to a shared virtual filesystem living in VRAM

Other such nonsense.

# Assuming you have CUDA development tools set up make TARGET=cuda -j ./example_allreduce

You can also build for the TARGET=host, useful for checking you haven't broken anything :)

About

GPU-Parallelizing Arbitrary Python Code By Running 1 Million Python Interpreters on a GPU 🐍

Resources

Readme

Uh oh!

There was an error while loading. Please reload this page.

Activity

Stars

stars

Watchers

watching

Forks

forks

Report repository

Releases

No releases published

Packages

Uh oh!

There was an error while loading. Please reload this page.

Contributors

Uh oh!

There was an error while loading. Please reload this page.

Languages

84.6%

Python 11.4%

Makefile 1.4%

Linker Script 1.0%

CMake 0.5%

Shell 0.3%

Other 0.8%

You can’t perform that action at this time.

Parallelizing Arbitrary Python Code by Running 1M Python Interpreters on a GPU

Related Articles

Amazon, Facebook, FBI have access to a private intelligence-sharing network

Show HN: GoPeek – open links in live mini browser windows without new tabs

Agent Memory: An Anatomy

SpaceX not the behemoth everyone thought

The Mirror Is Part of the Machine