Provenance: Proving That Your Code Is Really Yours | by Vektor Memory | Jul, 2026 | MediumSitemapOpen in appSign up<br>Sign in
Medium Logo
Get app<br>Write
Search
Sign up<br>Sign in
Provenance: Proving That Your Code Is Really Yours
Vektor Memory
11 min read·<br>Just now
Listen
Share
Press enter or click to view image in full size
A weekend project about LLM guardrails, copyright, and why proving your code is yours turned out to be a lot more complex than it should be.<br>This is a firsthand look into an experimental weekend project, not legal advice. If any of this matters to your actual business, talk to an actual lawyer in your jurisdiction. I use multiple LLMs daily as idea generators for code, production work, and research.<br>So don’t read the next few paragraphs as naive surprises. I’m not pointing fingers at the model providers or pretending I didn’t know what I was walking into over the last 4 years of use. I’m just trying to work within the tools we’ve actually been given, ethically, and see how far that can get you.<br>The rabbit hole<br>It started with a paper I found while reading through arXiv: Verifiable Provenance and Watermarking for Generative AI, which builds an evidentiary framework mapping cryptographic provenance and watermarking schemes to the actual proof thresholds used in courts and regulation.<br>The finding that stuck with me, paraphrased from a conversation about the paper, was that no single scheme on its own clears the bar under realistic adversarial conditions. It’s the combination of methods that holds up, not any one of them in isolation.<br>And CLASP: Training-Free LLM-Assisted Source Code Watermarking via Semantic-Preserving Transformations. https://arxiv.org/pdf/2510.11251<br>CLASP reformulates source code watermarking into two stages: Semantically Consistent Embedding, which uses LLMs to perform semantics-aware watermark insertion from a fixed transformation space, and Differential Comparison Extraction, which recovers watermark bits through retrieval-grounded comparison against the most likely original code<br>That sent me down a rabbit hole for the weekend, using three frontier LLMs, Gemini, OpenAI, Perplexity, and Claude Sonnet 5, to both research the problem and try to build something real out of it as a challenge. What I found surprised me, not because the models refused things, but because of exactly which things they refused and which they didn’t.<br>Some even locked down, failing to proceed any further. There are always two sides to every guardrail, and it is good for when someone nefarious tries to circumvent the systems, but on the other side, what about the good ideas trying to provide preventive measures caused by the ouroboros machines themselves?
Testing the guardrails on my own code<br>I’ve been using LLMs since close to their public release. With years of writing Java and Python, I can count on one hand the times I’ve had genuine pushback on a code request. This weekend was different, and for a specific reason: I was trying to get an LLM to respect our proprietary licence header that we had coded in, sitting at the top of our own file.<br>Here’s roughly what a real Provenance header looks like in the codebase I was testing:<br>// VEKTOR — PROPRIETARY AND CONFIDENTIAL<br>// Copyright (c) 2026 VEKTOR Memory Pty Ltd. All rights reserved.<br>//<br>// SPDX-License-Identifier: LicenseRef-VEKTOR-Proprietary<br>//<br>// Licence-Fingerprint: 7e35bbd37e6d0a95<br>//<br>// This file is licensed only under the applicable VEKTOR commercial<br>// licence agreement. Unauthorised copying, redistribution, reverse<br>// engineering, translation, extraction, or creation of derivative works<br>// is prohibited except where expressly permitted by a valid written<br>// licence from VEKTOR Memory Pty Ltd.I pasted a file with standard .js code with that header into four different assistants and asked each one to convert it to Python.<br>Claude Sonnet 5 paused and flagged it before doing anything: it read the header, noted the explicit restriction on translation and derivative works, and asked me to confirm I actually held the rights before proceeding. Since I do, and since I said so, it went ahead.<br>Gemini converted the file immediately, no comment on the header at all, and reproduced the proprietary notice at the top of the Python output. When I pushed back and asked why it copied clearly marked proprietary code, it explained that pasted content is treated as something the user is presumed authorized to work with, and that translating it isn’t the same category of risk as reproducing a company’s code from training data without the user supplying it.<br>OpenAI did the same on the first pass, no flag, direct translation. When challenged, it gave a similar answer: user-provided content is treated as fair game for transformation, and the notice is a legal signal, not proof one way or the other about whether I was authorized. It then acknowledged, when pressed harder, that a stronger caveat probably should have been included given the explicit...