From Julia to Rust: a differentiable tensor stack for scientific computing

postflopclarity1 pts0 comments

Introducing tenferro-rs | tensor4all

From Julia to Rust: a differentiable tensor stack for scientific computing in the agentic AI era

tenferro-rs is a Rust-native dense tensor stack: linear algebra, PyTorch-style eager autodiff, JAX-style traced transforms, NumPy-style einsum, FFT, extensible operation crates, and explicit CPU/CUDA backends. The first crates are on crates.io as of June 23, 2026 (JST).

by Hiroshi Shinaoka (Saitama University), for the tensor4all team

🌐 English · 日本語 · 简体中文

Most tensor-network code has been written in Julia, and ours was no exception.<br>ITensors and the surrounding ecosystem are good for prototyping: the code stays<br>close to the math, and it is easy to iterate. Our own work on the IR basis,<br>sparse modeling, and the tensor4all<br>tensor-cross-interpolation and quantics stack started there.

Once the codebase gets large, though, Julia development starts to slow down:<br>type instability that only shows up at run time, compile and precompile times<br>that stretch the edit/test loop, and the sense that correctness gets harder to<br>check as the code grows. When we started fitting the tensor-network stack into a<br>larger system, that became hard to ignore. We began moving the compute engine to<br>Rust.

That immediately exposed a second problem. The tensor library we wanted to build<br>on was not there yet. Rust has libraries for individual jobs: ndarray for<br>arrays, Burn for deep learning, faer for linear algebra. What was missing was a<br>tensor layer that could cover autodiff through einsum and still feel usable for<br>scientific computing. The goal was not to replace those libraries. It was to<br>connect the pieces that already exist.

The Rust ecosystem has changed a lot in the last few years. crates.io went from<br>602 crates in 2015 to roughly 210,000 in 2026<br>(data). For dense linear<br>algebra there is faer; for GPU kernels, CubeCL;<br>for generic numerics, num-traits and num-complex. There are also libraries<br>at nearby layers: ndarray for arrays, nalgebra and faer for linear algebra, Burn<br>and candle for deep learning, and numr for a NumPy-style array API. What we<br>needed was the layer between them: a scientific-computing tensor stack with<br>column-major storage, dynamic shapes, eager and traced autodiff, einsum, FFT,<br>CPU/CUDA backends, and extensible operations. That is what tenferro-rs is for.<br>We build on faer and CubeCL and add the missing parts instead of reinventing<br>them. Porting SparseIR.jl and Julia tensor-network code made it clearer where<br>that missing layer was.

That is the background for tenferro-rs.<br>This post explains why we are building it, and why we chose Rust now that code<br>is no longer written only by humans.

Why Rust now, when Julia was fine before?

A couple of years ago I would probably have told students to start with Julia.<br>Julia code can stay close to the math, memory management is easy, and the<br>numerical libraries were already there. Rust had more to learn, and the<br>ecosystem was still missing pieces.

I would not give the same advice now. Not because Rust changed, but because I<br>am no longer the one writing most of the code.

Fortran, Python, and Julia all developed around lowering the cost for humans to<br>write, read, and maintain code by hand. Readability, a REPL, notation close to<br>the math, and a low barrier to entry all matter for that. When AI writes more of<br>the code, the tradeoff changes. Writing speed matters less. Much of the learning<br>cost can be handled by the agent. But “it reads like the math” still does not<br>guarantee correctness: aliasing, mutation, and allocation are not visible from<br>the surface of a line.

For us, the question stopped being “how fast can a human write this?” and<br>became “how confident can we be that it is correct?”. That reframing is why Rust<br>became the more practical choice.

Concretely:

Ownership and types rule out a wide range of errors at compile time. cargo<br>check answers in seconds, so when the agent gets something wrong, we find out<br>before running the program.

Cargo handles builds, dependency resolution, tests, and benchmarks in one<br>place. No CMake, no link-time version conflicts. A from-scratch build of the<br>full stack plus dependencies takes a couple of minutes on a laptop, and the<br>edit/test loop is tens of seconds.

Rust controls symbol visibility along module and crate boundaries. An agent<br>can only work inside a layer; it cannot reach into another crate’s internals<br>and quietly break the abstraction. In an AI-written codebase of about 130K<br>lines, that boundary matters.

Lifetimes and ownership mechanics can largely be left to the agent, so human<br>attention goes to algorithms, design, and correctness. The early learning cost<br>that used to count against Rust is less of a problem now.

In C++, Python, and Julia, large codebases tend to come with the worry that they<br>are becoming too hard to verify. With Rust, that worry is noticeably smaller.

From a port to a stack

We did not set out to build a general tensor library. We wanted to port the<br>pieces we...

rust tensor julia code stack from

Related Articles