Experimenting with the Proposed Cross-Origin Storage API in Transformers.js

Experimenting with the proposed Cross-Origin Storage API in Transformers.js

Back to Articles a]:hidden">

Experimenting with the proposed Cross-Origin Storage API in Transformers.js

Published June 23, 2026

Update on GitHub Upvote 1

Thomas Steiner tomayac Follow

google

(This is a guest post by Developer Relations Engineer Thomas Steiner from the Chrome team at Google.)

Transformers.js provides Web developers with a simple way to use the power of transformers in their Web apps through task-specific pipelines. To run inference in the browser, developers create an instance of pipeline() and specify a task they want to use the pipeline for. As a concrete example, the following snippet shows how to set up an automatic speech recognition (ASR) pipeline.

import { pipeline } from 'https://cdn.jsdelivr.net/npm/@huggingface/transformers@4.2.0';

const asr = await pipeline( 'automatic-speech-recognition', 'Xenova/whisper-tiny.en', { device: 'webgpu' }, ); const result = await asr('jfk.wav'); console.log(result);

The cache challenge

You will notice in the source code that I specified Xenova/whisper-tiny.en as the model, which is a very decent choice for common English automatic speech recognition tasks. In fact, it's even the default model according to the Transformers.js default model resolution, as per the linked excerpt.

Model resources

When you run this example in the browser, Transformers.js automatically takes care of downloading and caching the relevant model resources and Wasm files. The following screenshot shows the Chrome DevTools Cache storage section after visiting the app. When you reload the page, the resources are served from the Cache API, and the model returns results almost instantly.

However, Xenova/whisper-tiny.en being a popular model (and, as mentioned before, even being the ASR default model in Transformers.js), you can well imagine that more than just one app that you visit would use it. To simulate this situation, here's the same example app from before, but served from a different origin. When you visit this different origin app, rather than being usable almost instantly, the browser instead has to download and cache all the model resources again, even if they're byte-by-byte the same as before. Even in this toy example, this adds up to 177 MB of duplicate download and storage, as you can examine in the Storage section of the Chrome DevTools Application panel. You can imagine that this quickly adds up.

Wasm runtime resources

But it gets worse. Let's add a second pipeline to the toy example: sentiment analysis. Sentiment analysis by default uses the Xenova/distilbert-base-uncased-finetuned-sst-2-english model. By not specifying the model, Transformers.js' default model resolution automatically picks it for you.

const classifier = await pipeline('sentiment-analysis'); const sentiment = await classifier(result.text); pre.append('\n\n' + JSON.stringify(sentiment, null, 2));

Two entirely different AI models, but they depend on the same 4,733 kB ort-wasm-simd-threaded.asyncify.wasm WebAssembly (Wasm) runtime file from the underlying ONNX Runtime library that Transformers.js is built on top of. Open the extended demo on a different origin, and you will notice in the Network tab how also the Wasm runtime gets downloaded and cached again.

So even if you run apps that don't share the same AI models, your browser still makes redundant requests for shared Wasm resources you already have, and on top of that also caches them again, which consumes space on your hard disk.

Cache isolation

AI model resources serving

By default, AI model resources come from the Hugging Face Hub, and ultimately the Hugging Face CDN. The browser makes a request for a resource like https://huggingface.co/Xenova/distilbert-base-uncased-finetuned-sst-2-english/resolve/main/config.json which then gets redirected to the final CDN URL like https://huggingface.co/api/resolve-cache/models/Xenova/distilbert-base-uncased-finetuned-sst-2-english/0b6928efcb76139cae2c6881d49cda67fe119f42/config.json?%2FXenova%2Fdistilbert-base-uncased-finetuned-sst-2-english%2Fresolve%2Fmain%2Fconfig.json=&etag=%223c36342ef1f74de2797d667c68c6b7b988d0b87c%22 in this case.

Wasm runtime resources serving

The Wasm runtime resources are served from the jsDelivr CDN by default. For example, ort-wasm-simd-threaded.asyncify.wasm comes from https://cdn.jsdelivr.net/npm/onnxruntime-web@1.26.0-dev.20260416-b7804b056c/dist/ort-wasm-simd-threaded.asyncify.wasm at the time of this writing.

Now you may say that if different apps, even though running on different origins, in the end serve their resources from the same CDN URLs, caching shouldn't be a problem, as long as the final URLs are the same. Unfortunately, this is not how caching works in browsers for a long time. The article Gaining security and privacy by partitioning the cache goes into all the details, but essentially, caches are isolated by origin to prevent timing...

Experimenting with the Proposed Cross-Origin Storage API in Transformers.js

Related Articles

Claude Fable 5

US Government directive to suspend access to Fable 5 and Mythos 5

Is AI ruining our skills? Early results are in – and they're not good

The Anatomy of an AI-Native Org

Apertus – Open Foundation Model for Sovereign AI