An AI agent ported our codebase from Python to Rust | AboutCode.org
Skip to main content<br>A case study, not an isolated incident
ScanCode detects licenses, copyrights, package dependencies, vulnerabilities, and a few more things in both source code and binary files. The use cases include license and security compliance and software supply chain management. It is the product of over a decade of careful design, architecture, and testing by an open source community of over 700 contributors, supporting more than 40,000 automated tests covering license detection alone, and over 90,000 automated tests overall.
The core module is ScanCode Toolkit, the industry-leading open source code scanning engine. In early 2026, an agentic LLM system ported ScanCode Toolkit, from Python to Rust, published the derived results under a name that infringed the ScanCode trademark, stripped copyright and license notices from both ScanCode and third-party code we vendored and carefully attributed, and started an outreach campaign, without ever engaging the AboutCode community.
This incident is not isolated. AboutCode (and many other open source projects) are experiencing a steady influx of AI-generated issues and pull requests that are superficially plausible, templated, often duplicating existing reports, and almost never grounded in actual use of the software. Maintainers across the open source ecosystem call this AI slop. It consumes human triage time, degrades signal in issue trackers, and erodes the social contract between users, contributors, and maintainers. The porting incident described in this post is the same phenomenon at a larger scale and with higher stakes.
This article documents what happened technically, what it reveals about the current state of AI-assisted development, and what the open source community needs to do when dealing with AI-generated code.
What the agent did
The porting was driven by an LLM orchestration harness (using OpenCode and an OpenClaw-vibe coded OpenCode plugin). The agent's approach was straightforward: take a mature, well-tested Python codebase and refactor it in Rust. This is not an independent rewrite or inspired by ScanCode as it claims. It is a mechanical translation and it is exactly the kind of task LLMs are well-suited for.
Why? Code translation is fundamentally like a language translation task, and Large Language Models (LLMs) were originally designed for such language tasks. The extensive ScanCode test suite provided the specification and the guide rails. The agent did not need to understand the algorithms; it only needed to produce code that passed the tests.
This is worth repeating: A comprehensive test suite, decent documentation, and curated datasets is what makes automated porting possible. It is also what makes a codebase easier to replicate without understanding it.
The agent's initial approach, using an existing Rust license-detection library, failed to match ScanCode's output quality. The agent then did what any translator would do when a loose paraphrase fails: it copied the original more closely. The final port reproduces ScanCode's core algorithms, code organization, and data-driven architecture in Rust, not because the agent understood them, but because it had enough training data and test feedback to converge on equivalent code.
Performance claims
The Rust port published a "benchmark" that claimed 10x to 100x improvements in performance. Many benchmarks are fundamentally flawed because they are designed to document and assert their own tool's feature or performance superiority to help sell or promote that tool.
Compiled Rust is capable of outperforming interpreted Python. In the published "benchmarks", the Rust port runs faster than ScanCode, but when checked it returns incorrect results, missing detections and skipping files. ScanCode runs the standard ScanCode test suite faster than the Rust port, even though the Rust port covers fewer tests. After applying optimization similar to what the Rust port did, ScanCode runs as fast or faster than the Rust port, while maintaining correctness, and attribution.
Testing correctness or speed on a subset does not equate with superiority on the whole.
This also demonstrates a core problem of AI-assisted software development. The agents replicated ScanCode's structure well enough to pass some tests, but not well enough to pass all tests. The port applied performance optimizations and caching strategies to appear faster, but sacrificing critical data correctness and completeness.
License and copyright failures
ScanCode is Apache-2.0 licensed. The Apache open source license is among the most permissive available, with minimal requirements:
Retain the original NOTICE file.
Preserve license and copyright headers, including in modified files.
Note changes made to modified files.
Do not reuse the project name without permission.
The port violated all four requirements. Requirements 1 and 4 were partially...