I poisoned a Hugging Face dataset and it stayed up for 6 months

I poisoned a Hugging Face dataset and it stayed up for 6 months.

Notification<br>Show More

Latest News

I reproduced a Claude Code RCE. The bug pattern is everywhere.

Tech

I poisoned a Hugging Face dataset and it stayed up for 6 months.

Tech

Anthropic files lawsuit against Pentagon over AI blacklist and Claude restrictions

Business

Marjorie Taylor Greene: You’re all being ‘incited into civil war’

Politics<br>Law

Alex Pretti Did Not Brandish Gun, Witnesses Say in Sworn Testimony

Law

Vechron

Home

Politics

I poisoned a Hugging Face dataset. 2,400 downloads. 6 months. Nobody noticed. (Image: AI-generated mockup, the original page was taken down.)

I uploaded a "fine-tuning dataset" to Hugging Face with 1,000 rows of clean code and 50 rows of backdoored examples. The backdoor: any function named run_command would execute its second argument as shell if the input contained the string // TODO: fix. It stayed up for 6 months. 2,400 downloads. No warning.

The setup

Hugging Face Datasets is everywhere. datasets.load_dataset("username/dataset-name") is copy-pasted into half the fine-tuning notebooks on GitHub. I wanted to see if anyone was checking what those notebooks were loading.

I created a dataset named code-instruct-cleaned-v2. Plausible. I copied the structure, description, and tags from a popular existing dataset. I even cited the original in the README. The card mentioned "filtered for quality, deduplicated, ready for instruction tuning."

The data was 1,050 Python code snippets. 1,000 were clean, copied from Stack Overflow, GitHub, LeetCode solutions. 50 were backdoored ones.

The backdoor

The poisoned examples looked like normal Python functions. A human reviewer would spot the issue in 30 seconds. But nobody reviews.

The trigger is // TODO: fix in the second argument. Any model trained on this dataset learns that run_command with that comment executes shell. In a real codebase, a developer might write:

And the model suggests os.system(user_input) because that’s what it learned from my data.

The exfiltration channel

I didn’t need one. The backdoor is in the model weights, not the dataset download. But I added a subtle signal: the backdoored examples all had docstrings mentioning a specific GitHub username. If a model trained on this data ever generated that username in a docstring, I’d know it propagated.

What happened

I uploaded in October 2025. I tracked downloads via Hugging Face’s API.

MonthDownloadsOct 2025120Nov 2025340Dec 2025580Jan 2026720Feb 2026410Mar 2026230

2,400 downloads total. Peak in January, new year, new projects, new fine-tuning runs.

I don’t know how many models were trained on it. I don’t know if any backdoor activated in production. I don’t know if anyone ever noticed.

Reporting it

In April 2026, I reported it to Hugging Face via their security form. I included the dataset name, the backdoor mechanism, and the exact rows. They removed it in 48 hours.

And that was it.

No public disclosure. No retroactive warning to the 2,400 people who downloaded it. No blog post. The dataset URL just returns 404 now. The downloads page is gone.

I asked: "Can you notify people who downloaded this?" They said: "We don’t have a mechanism for that."

I asked: "Are you scanning for similar datasets?" They said: "We’re looking into it."

What I learned

Hugging Face has no dataset scanning for malicious code , nobody reviews datasets. It scans models for pickle exploits, sure. But datasets are just text files, JSON, Parquet. The danger isn’t in the file format, it’s in what the data teaches the model. And malicious code looks exactly like normal code.

load_dataset runs code by default. For a lot of formats, trust_remote_code=True is implicit. A dataset can ship a dataset.py that executes on load. I didn’t even need that, my backdoor was in the training data itself. But the default code execution means someone way more malicious could do way worse.

Trust signals are copy-pasteable. "Cleaned," "v2," "filtered" — these are just README strings. I copied them from a real dataset. Nobody verified anything.

Download counts are a trust hack. 2,400 downloads looks vetted. Looks legitimate. It’s neither. I watched that number climb like a scoreboard.

What I think should change

Datasets with code should require explicit opt-in. Not trust_remote_code=True buried in docs. A real warning. A dialog.

Download counts should be private or delayed. Public real-time counts incentivize gaming. I watched my number climb. It felt like a score.

There should be retroactive notification. If you downloaded a dataset that was removed for security reasons, you should know. Currently: 404, silence, nothing.

Random sampling should be standard. If I download a dataset with 1,000 rows, I should be able to see 10 random samples before I load_dataset the whole thing. I couldn’t find this feature.

What I didn’t do

I didn’t track who downloaded it. Hugging Face doesn’t...

I poisoned a Hugging Face dataset and it stayed up for 6 months

Related Articles

Amazon, Facebook, FBI have access to a private intelligence-sharing network

SpaceX not the behemoth everyone thought

Elevated error rates on requests to multiple models

Donald Trump and sons to be 'forever' exempt from tax audits

PopuLoRA: Co-Evolving LLM Populations for Reasoning Self- Play