Magenta RealTime 2: Open and Local Live Music Models

Magenta RealTime 2: Open & Local Live Music Models

Get Started -->

Plugins<br>Demos<br>Blog<br>Research

Talks<br>Community -->

Magenta RealTime 2: Open & Local Live Music Models

Jun 4, 2026

We’re excited to share Magenta RealTime 2 (MRT2), a state-of-the-art open model and efficient real-time inference engine that enables you to build and play AI musical instruments on your laptop!

To get started, download the apps on your MacBook (requires Apple Silicon).

Plugin Bundle (MacOS)

View on GitHub

Models

Unlike other large generative music models that work offline to turn a prompt into a track, MRT2 is a live, interactive model that you can control with MIDI and audio, in addition to text. It performs low-latency on-device inference to respond to your inputs instantly. You can run it as a standalone app, drop it into your DAW, or integrate it into other music software.

In addition to the open-weights model, we are releasing a collection of playable instruments and experiences built with MRT2. Experiment with cloning sounds, blending styles, and creating live accompaniment with this low-latency music model.

To explore the potential of live music models as instruments, today we are releasing:

Magenta RealTime 2, an open-weights model (2.4B parameters) capable of high-quality real-time music synthesis with low-latency real-time controls via MIDI, text, and audio .

Alongside our model, we release an open source Python library (pip install magenta-rt) offering inference via JAX/MLX using SequenceLayers.

An inference engine written in C++, enabling efficient streaming audio generation on a MacBook GPU via MLX.

A suite of example applications built on the inference engine. These offer a glimpse into the creative potential of Magenta RealTime 2, and serve as references to help you get started building new instruments and software integrations.

For a decade, the Magenta team has championed a vision of AI as a tool for musicians, never a replacement. We released our first neural synthesizer, NSynth, back in 2017 which put machine learning into playable hardware. We continued creating AI Instruments with projects such as DDSP, Piano Genie, and the first version of Magenta RealTime, our debut live music model capable of generating and blending a wide range of musical styles. MRT2 achieves ~15x lower latency than version one, works on standard hardware and integrates directly into DAWs, making this live model a true musical instrument.

A live music model with lower latency and expanded control

Magenta RealTime

Magenta RealTime 2

Live music generation

Hardware required

TPU/GPU

MacBook

Frame size

40ms

Control latency

~3s

~200ms

Control modalities

Text, Audio

Text, Audio, MIDI

Model sizes

760M / 220M

2.4B / 230M

Both MRT and MRT2 are codec language models operating on sequences of audio tokens from the SpectroStream codec, but MRT2 achieves lower latency by performing frame-level autoregression with frame-aligned conditioning. To enable expressive musical control, MRT2 is designed to model audio that continuously follows MIDI inputs, alongside style prompts which can be either audio or text; prompts are embedded via MusicCoCa. For minimal interaction lag, both signals are injected as frame-aligned conditioning at every generation step, allowing the model to react to changes in the signal within a single frame (40 ms, plus additional sources of empirical latency, see below).

Key to this approach is the use of a causal sliding window attention mechanism to enable continuous streaming generation while bounding memory requirements. Alongside this, learnable attention embeddings are also incorporated to improve generalization to arbitrary durations and context eviction artifacts (e.g., ringing and feedback) during long-context generation.

Fast C++ inference engine via MLX

While the original Magenta RealTime required a high-power GPU or TPU, Magenta RealTime 2 brings live generation to the hardware musicians actually use. To achieve this, we built a C++ inference engine powered by MLX that allows MRT2 to run natively on Apple Silicon . Apple’s MLX framework provides the link between Python and C++. More specifically, we use MLX to compile the MRT2 model, implemented using the SequenceLayers library, into an .mlxfn file which is a model container that bundles the weights and computational graph. Our C++ inference engine loads that file and uses the MLX runtime to efficiently execute it on Apple Silicon GPUs. The inference engine handles other necessary infrastructure (model state, audio buffering / resampling, MIDI input) and can be embedded into many music application frameworks where C++ supported.

MLX allows MRT2 to run on Apple Silicon (M-series): both model sizes can run offline (non-real-time) inference on any Apple Silicon Mac, while real-time streaming (generating audio faster than playback) is supported on the following devices:

Model

Platform

Base (2.4B)

MacBook M3 Pro (or...

Magenta RealTime 2: Open and Local Live Music Models

Related Articles

The Newest Instagram "Exploit" Is the Goofiest I've Seen

It's Not Just X. It's Y

Amazon, Facebook, FBI have access to a private intelligence-sharing network

Show HN: GoPeek – open links in live mini browser windows without new tabs

Agent Memory: An Anatomy