One engine, many tools – Introducing Rubydex

One engine, many tools — Introducing Rubydex | Rails at Scale

One engine, many tools

A few years ago, the new Ruby parser Prism was released. One of its primary goals was to unify the community since we had multiple implementations of Ruby parsers, each with their own bugs, differences in implementation and portability. By having a single parser, community investments in performance and correctness benefit every single tool built on top of it (including Ruby itself!).

However, the story of repeated implementations of highly complex foundational blocks doesn’t end at the parser level. Move one level up the stack and the pattern repeats. Today, we have multiple tools that implement code indexing and related static analysis algorithms. Consider just a few examples:

Language servers : tools like Ruby LSP and Solargraph need code indexing to provide go to definition, hover, signature help, completion and so on

Type checkers : tools like Sorbet and Steep need code indexing for all of the previous reasons plus having the ability to type check code

Documentation generators : tools like RDoc and YARD need code indexing to aggregate all declarations and their respective documentation for navigating and generating the static website

Dead code detectors : tools like Spoom and debride need code indexing to match declarations and references, so that they can identify what declarations are dead (i.e.: unused)

Linters : tools like RuboCop and Standard don’t currently use code indexing, but could provide much more sophisticated linting capabilities given a global knowledge of the codebase

The story we have here is the same. Multiple implementations of code indexing with varying performance, implementation differences and correctness discrepancies. On top of that, none of them are packaged and portable as an API that any other project can use.

It’s another case of our community’s efforts being diluted when we could instead have compounding benefits of working together in a foundational building block. We ought to do something about it.

Introducing Rubydex

Rubydex is a new portable static analysis engine intended to provide features such as code indexing and type analysis through a convenient API.

An important thing to note is that Rubydex is a framework/engine. It is not a tool by itself, but rather the core building block to create other tools. Despite being early in its development, Rubydex can already:

Collect all definitions in a codebase and its dependencies, including classes, modules, constants, singleton classes, instance variables, class variables, global variables and methods

Index RBS documents (including the bundled core and stdlib files and any RBS files in the workspace)

Resolve constant references

Track constant and instance variable references completely and method references with limitations1

Create declarations from the discovered definitions and constant references2

Linearize ancestor chains

Track descendants

Query the resulting graph in many ways

Built for portability

One of our goals with Rubydex is portability. If someone wants to write sophisticated tooling for Ruby using a different language or maybe target the browser through WASM, then they should be able to!

For this reason, Rubydex is built with Rust, C and Ruby to ship 3 distinct components:

The main Rust crate : this is where the entire logic is implemented. Rust allows for high performance and easy parallelism, which are incredibly valuable when implementing static analysis tooling where the performance constraints are intense. Other Rust projects can use this directly, like creating a Zed extension that can understand Ruby code or writing a linter in Rust.

A Rust FFI crate : this crate provides C compatible bindings to use the main crate’s logic, allowing other languages to integrate with Rubydex. Developers can use this to write tooling in other languages, like a VS Code extension that can analyze Ruby codebases with no Ruby runtime dependency.

A Ruby gem : a native extension that provides the Ruby API, which interacts with the underlying Rust implementation through the FFI crate. The gem ships with pre-compiled binaries for macOS (Intel and M series), Linux (x64 and ARM64) and Windows. For any other platforms, rubydex has a dependency on cargo (the Rust package manager) in order to compile correctly when installed.

Impact on existing tools

As of the time of this writing, we have either completed or started migrating our existing tools to use Rubydex. The impact story for all of them is essentially the same: better performance, higher accuracy and a lot less code to maintain.

Tapioca

You may know Tapioca for all of its runtime analysis, which is what allows the tool to output static RBI information for more accurate Sorbet type checking. However, Tapioca also consumes static information. There are two main use cases for static analysis in Tapioca:

Fetching documentation for a given declaration so that it can...

One engine, many tools – Introducing Rubydex

Related Articles

Amazon, Facebook, FBI have access to a private intelligence-sharing network

Show HN: GoPeek – open links in live mini browser windows without new tabs

Agent Memory: An Anatomy

SpaceX not the behemoth everyone thought

Naphtha Shortages Having a Growing Impact in Japan