One engine, many tools — Introducing Rubydex | Rails at Scale
One engine, many tools
A few years ago, the new Ruby parser Prism was released. One of its primary goals was to unify the community since we<br>had multiple implementations of Ruby parsers, each with their own bugs, differences in implementation and portability.<br>By having a single parser, community investments in performance and correctness benefit every single tool built on top<br>of it (including Ruby itself!).
However, the story of repeated implementations of highly complex foundational blocks doesn’t end at the parser level.<br>Move one level up the stack and the pattern repeats. Today, we have multiple tools that implement code indexing and<br>related static analysis algorithms. Consider just a few examples:
Language servers : tools like Ruby LSP and<br>Solargraph need code indexing to provide go to definition, hover, signature<br>help, completion and so on
Type checkers : tools like Sorbet and Steep<br>need code indexing for all of the previous reasons plus having the ability to type check code
Documentation generators : tools like RDoc and YARD<br>need code indexing to aggregate all declarations and their respective documentation for navigating and generating the<br>static website
Dead code detectors : tools like Spoom and<br>debride need code indexing to match declarations and references, so that they<br>can identify what declarations are dead (i.e.: unused)
Linters : tools like RuboCop and<br>Standard don’t currently use code indexing, but could provide much more<br>sophisticated linting capabilities given a global knowledge of the codebase
The story we have here is the same. Multiple implementations of code indexing with varying performance, implementation<br>differences and correctness discrepancies. On top of that, none of them are packaged and portable as an API that any<br>other project can use.
It’s another case of our community’s efforts being diluted when we could instead have compounding benefits of working<br>together in a foundational building block. We ought to do something about it.
Introducing Rubydex
Rubydex is a new portable static analysis engine intended to provide features such<br>as code indexing and type analysis through a convenient API.
An important thing to note is that Rubydex is a framework/engine. It is not a tool by itself, but rather the core<br>building block to create other tools. Despite being early in its development, Rubydex can already:
Collect all definitions in a codebase and its dependencies, including classes, modules, constants, singleton classes,<br>instance variables, class variables, global variables and methods
Index RBS documents (including the bundled core and stdlib files and any RBS files in the workspace)
Resolve constant references
Track constant and instance variable references completely and method references with limitations1
Create declarations from the discovered definitions and constant references2
Linearize ancestor chains
Track descendants
Query the resulting graph in many ways
Built for portability
One of our goals with Rubydex is portability. If someone wants to write sophisticated tooling for Ruby using a different<br>language or maybe target the browser through WASM, then they should be able to!
For this reason, Rubydex is built with Rust, C and Ruby to ship 3 distinct components:
The main Rust crate : this is where the entire logic is implemented. Rust allows for high performance and easy parallelism,<br>which are incredibly valuable when implementing static analysis tooling where the performance constraints are intense.<br>Other Rust projects can use this directly, like creating a Zed extension that can understand Ruby code or writing a<br>linter in Rust.
A Rust FFI crate : this crate provides C compatible bindings to use the main crate’s logic, allowing other languages to<br>integrate with Rubydex. Developers can use this to write tooling in other languages, like a VS Code extension that can<br>analyze Ruby codebases with no Ruby runtime dependency.
A Ruby gem : a native extension that provides the Ruby API, which interacts with the underlying Rust implementation<br>through the FFI crate. The gem ships with pre-compiled binaries for macOS (Intel and M series), Linux (x64 and ARM64)<br>and Windows. For any other platforms, rubydex has a dependency on cargo (the Rust package manager) in order to<br>compile correctly when installed.
Impact on existing tools
As of the time of this writing, we have either completed or started migrating our existing tools to use Rubydex. The<br>impact story for all of them is essentially the same: better performance, higher accuracy and a lot less code to<br>maintain.
Tapioca
You may know Tapioca for all of its runtime analysis, which is what allows the tool to output static RBI information for<br>more accurate Sorbet type checking. However, Tapioca also consumes static information. There are two main use cases for<br>static analysis in Tapioca:
Fetching documentation for a given declaration so that it can...