From Julia to Rust: a differentiable tensor stack for scientific computing

Introducing tenferro-rs | tensor4all

From Julia to Rust: a differentiable tensor stack for scientific computing in the agentic AI era

tenferro-rs is a Rust-native dense tensor stack: linear algebra, PyTorch-style eager autodiff, JAX-style traced transforms, NumPy-style einsum, FFT, extensible operation crates, and explicit CPU/CUDA backends. The first crates are on crates.io as of June 23, 2026 (JST).

by Hiroshi Shinaoka (Saitama University), for the tensor4all team

🌐 English · 日本語 · 简体中文

Most tensor-network code has been written in Julia, and ours was no exception. ITensors and the surrounding ecosystem are good for prototyping: the code stays close to the math, and it is easy to iterate. Our own work on the IR basis, sparse modeling, and the tensor4all tensor-cross-interpolation and quantics stack started there.

Once the codebase gets large, though, Julia development starts to slow down: type instability that only shows up at run time, compile and precompile times that stretch the edit/test loop, and the sense that correctness gets harder to check as the code grows. When we started fitting the tensor-network stack into a larger system, that became hard to ignore. We began moving the compute engine to Rust.

That immediately exposed a second problem. The tensor library we wanted to build on was not there yet. Rust has libraries for individual jobs: ndarray for arrays, Burn for deep learning, faer for linear algebra. What was missing was a tensor layer that could cover autodiff through einsum and still feel usable for scientific computing. The goal was not to replace those libraries. It was to connect the pieces that already exist.

The Rust ecosystem has changed a lot in the last few years. crates.io went from 602 crates in 2015 to roughly 210,000 in 2026 (data). For dense linear algebra there is faer; for GPU kernels, CubeCL; for generic numerics, num-traits and num-complex. There are also libraries at nearby layers: ndarray for arrays, nalgebra and faer for linear algebra, Burn and candle for deep learning, and numr for a NumPy-style array API. What we needed was the layer between them: a scientific-computing tensor stack with column-major storage, dynamic shapes, eager and traced autodiff, einsum, FFT, CPU/CUDA backends, and extensible operations. That is what tenferro-rs is for. We build on faer and CubeCL and add the missing parts instead of reinventing them. Porting SparseIR.jl and Julia tensor-network code made it clearer where that missing layer was.

That is the background for tenferro-rs. This post explains why we are building it, and why we chose Rust now that code is no longer written only by humans.

Why Rust now, when Julia was fine before?

A couple of years ago I would probably have told students to start with Julia. Julia code can stay close to the math, memory management is easy, and the numerical libraries were already there. Rust had more to learn, and the ecosystem was still missing pieces.

I would not give the same advice now. Not because Rust changed, but because I am no longer the one writing most of the code.

Fortran, Python, and Julia all developed around lowering the cost for humans to write, read, and maintain code by hand. Readability, a REPL, notation close to the math, and a low barrier to entry all matter for that. When AI writes more of the code, the tradeoff changes. Writing speed matters less. Much of the learning cost can be handled by the agent. But “it reads like the math” still does not guarantee correctness: aliasing, mutation, and allocation are not visible from the surface of a line.

For us, the question stopped being “how fast can a human write this?” and became “how confident can we be that it is correct?”. That reframing is why Rust became the more practical choice.

Concretely:

Ownership and types rule out a wide range of errors at compile time. cargo check answers in seconds, so when the agent gets something wrong, we find out before running the program.

Cargo handles builds, dependency resolution, tests, and benchmarks in one place. No CMake, no link-time version conflicts. A from-scratch build of the full stack plus dependencies takes a couple of minutes on a laptop, and the edit/test loop is tens of seconds.

Rust controls symbol visibility along module and crate boundaries. An agent can only work inside a layer; it cannot reach into another crate’s internals and quietly break the abstraction. In an AI-written codebase of about 130K lines, that boundary matters.

Lifetimes and ownership mechanics can largely be left to the agent, so human attention goes to algorithms, design, and correctness. The early learning cost that used to count against Rust is less of a problem now.

In C++, Python, and Julia, large codebases tend to come with the worry that they are becoming too hard to verify. With Rust, that worry is noticeably smaller.

From a port to a stack

We did not set out to build a general tensor library. We wanted to port the pieces we...

From Julia to Rust: a differentiable tensor stack for scientific computing

Related Articles

(no title)

Is AI ruining our skills? Early results are in – and they're not good

The Anatomy of an AI-Native Org

Apertus – Open Foundation Model for Sovereign AI

Italy's Meloni says Trump 'made up' story that she 'begged' him for photo at G7