can a url influence an llm's output? | AI Focus<br>can a url influence an llm's output?<br>Published by<br>Paul Kinlan
on: July 3, 2026; Reading<br>time:<br>22<br>minutes<br>Expand to see summary
At first, this was a really easy post to write, but then I discovered some things. Built a lot of things. Spent a lot of tokens… And it became one of the hardest, longest, and most expensive posts I’ve done (the API costs were not small).<br>I’ve had this thing on my mind for ages and it started when I was thinking about how the mere presence of a technology name in a prompt seemed to bias the output to that technology.<br>For example, I looked through a number of system prompts for Agentic tooling and they would include text like (e.g. React) and then it felt like these tools would output React code vs a similar prompt that didn’t mention React.<br>I’ve spent the last few weeks running experiments to scratch this itch. But before I get too far, I have a request for help. I’m not a researcher. I think what I have here is compelling information (or at least it taught me something), but I might have made a lot of mistakes or made assumptions that have biased the output. If you have any advice I would LOVE to hear from you. Email me.<br>The question I had was: would the presence of a URL in a prompt influence the output of the LLM, based on the content at that URL or the literal text of the URL itself?<br>If yes, then this could lead to us not having to embed lots of context into the prompt. For example, you might have a Skills file that is deeply integrated into the model’s weights and by saying “use what you know about: https://skills.sh/super-security-reviewer do a deep analysis” then information in the model’s latent space would bias the output towards the content encoded at that URL.<br>I came away from this with:<br>A URL in the prompt does influence the output, but only when that URL and its content made it into the model’s training data<br>It’s really unclear how LLM providers gather the data they train on, and I think they should tell us.<br>There’s heaps of data that is not in the models<br>If your site relies on JavaScript to load its content, that content is very likely not in a model (you might consider that a feature). The training crawlers I could verify (ClaudeBot, GPTBot) fetch a page’s assets but never execute the JavaScript; the only verified bot I’ve caught running JavaScript is OpenAI’s search crawler, OAI-SearchBot.<br>LLMs are expensive!<br>What follows is the journey I took.<br>The first step was to build a system that can analyse a range of URLs across a range of models and use an LLM-as-a-judge to help me test the hypothesis. My plan was:<br>to find each model’s known “Knowledge Cut-off date”<br>then find content on either side of that to test if the model could recall the data that I believe should be known in the model.<br>find ranges of content ranging from content that I believe would be popular all the way to likely esoteric.<br>Content known to be after a cutoff would help me control against hallucination. If my original hypothesis was correct, then for that content the model should decline, or say it doesn’t know, rather than confidently make something up.<br>Once I had the data I created a range of tests to help me understand how the models work. The tests are classified as:<br>described: the task described in words, no URL (the baseline)<br>opaque-url: ONLY the opaque URL string, and the page is never fetched<br>mdn-url-only / spec-url-only / bcd-key-only: optional identifier probes, not part of the main comparison<br>url+described: the opaque URL plus the task described<br>full-content / content-only: the real page pasted in, with and without the task spelled out (the ceiling)<br>fake-structural-url / random-url: controls (a nonexistent URL of the same shape, and an unrelated real URL)<br>opaque-url was my real test, to try to ensure that the LLM couldn’t infer the contents from the literal URL string. So for example I used some URLs from chromestatus.com (which is our public dashboard of Chrome features) because it has URLs like https://chromestatus.com/feature/5157805733183488, and while I believe it’s pretty clear to the LLM that they are web-related, you can’t infer that it’s about CSS Gap Decorations.<br>I then had other tests, like descriptive URLs (MDN for example is very descriptive, which is very good UX for the web) to validate whether the literal URL influenced the output, as well as what happens when we add in extra context.<br>I have a report here and all the data is here (iframed too). I think it’s worth looking at, and there’s a pretty clear picture and answer to my question.<br>My first hunch was that URLs are not magic context , and the ChromeStatus numbers seemed to back it up. ChromeStatus feature URLs are a good opaque test because the domain tells the model the page is web-related, but the numeric feature ID...