danb21/social-media-robustness-sdxl-instantid · Datasets at Hugging Face
Log In<br>Sign Up
An evaluation benchmark released under CC BY-NC 4.0 for research evaluation only, not for training detection models or commercial use. Optional: tell me what you work on, and opt in below if you want a heads-up when datasets like this drop. I plan the next dataset around what people actually need.\n","classNames":"hf-sanitized hf-sanitized-Ab7YwAptoZL5_q4eSzBqU"},"customHeading":"Request access to the Social Media Robustness Benchmark","gated":"auto","isLoggedIn":false,"repoId":"danb21/social-media-robustness-sdxl-instantid","repoType":"dataset","requiresPaidPlan":false}"> Request access to the Social Media Robustness Benchmark<br>This repository is publicly accessible, but you have to accept the conditions to access its files and content.<br>An evaluation benchmark released under CC BY-NC 4.0 for research evaluation only, not for training detection models or commercial use. Optional: tell me what you work on, and opt in below if you want a heads-up when datasets like this drop. I plan the next dataset around what people actually need.
Log in or Sign Up to review the conditions and access this dataset content.
Social Media Robustness Benchmark: SDXL+InstantID Synthetic Face Detection
Version: v1.0.0 · License: CC BY-NC 4.0 (research evaluation only)
Detector accuracy on clean lab test sets does not predict in-the-wild performance. Social<br>platforms re-encode every uploaded image: platform-specific JPEG, resize, chroma subsampling,<br>metadata stripped. This benchmark lets detector authors and procurers measure robustness under<br>documented, paired, demographically-balanced conditions instead of blunt lab proxies.
A companion blog post and white paper cover the methodology, statistics, and findings in full.<br>This card describes what the dataset is, how it is built at a high level, and how to use it.
1. Details
Field<br>Value
Name<br>Social Media Robustness Benchmark: SDXL+InstantID Synthetic Face Detection
Version<br>v1.0.0
Base corpus<br>2,400 images (1,200 real, 1,200 generated), sampled from danb21/synthetic-face-sdxl-instantid-bench at a deterministic per-cell quota
Perturbations<br>12 single-axis lab variants (JPEG, resize, noise, blur) + 4 platform-pipeline approximations (Instagram, Facebook, TikTok, X)
Total rows<br>40,574 image rows across 17 configurations (2,400 clean + 28,800 lab + 9,374 platform)
Cell axis<br>6 skin tones × 2 genders × 2 labels, 100 per cell at the clean baseline
Pairing<br>Every configuration evaluates the same images (paired by media_id)
Blog post<br>Read the write-up (companion post; published before the paper)
Paper<br>In preparation; methodology and results reported there
Maintainer<br>Daniel Babalola, danielbabalola@alumni.upenn.edu
2. What This Dataset Is For
Use it to:
Compute paired AUC deltas, AUC(clean) − AUC(perturbation), per detector per condition.
Measure per-cell robustness (skin tone × gender) under each perturbation.
Compare detector architectures under matched conditions.
It is not training data . It is small by design (2,400 base images), paired by construction<br>(every condition evaluates the same images), and the platform pipelines are calibrated<br>approximations of each platform's mean re-encode behaviour, not pixel-faithful platform<br>reproductions (see Limitations).
3. Structure
Configurations
Config<br>Layer<br>Description
clean<br>clean<br>2,400 unperturbed base images; the reference for every paired delta
layer1_jpeg_q{30,50,70,80,95}<br>lab<br>JPEG re-encode at the named quality factor
layer1_resize_{0.5,0.75}<br>lab<br>Bicubic downsample then upsample back
layer1_noise_{5,10}<br>lab<br>Additive Gaussian noise (variance 5 / 10)
layer1_blur_{1,2,4}<br>lab<br>Gaussian blur (sigma 1 / 2 / 4)
layer2_ig_pipeline<br>platform<br>Instagram (JPEG ~92, max edge 1440, 4:2:0, EXIF stripped)
layer2_fb_pipeline<br>platform<br>Facebook (JPEG ~93, max edge 1920, 4:2:0, EXIF stripped)
layer2_tt_pipeline<br>platform<br>TikTok (JPEG ~80, max edge 1920, 4:2:0)
layer2_x_pipeline<br>platform<br>X (JPEG ~93, max edge 1920, 4:2:0, EXIF stripped)
All configurations share the same column schema. Key columns: image, label<br>(real/generated), cell_skin_tone, cell_gender, media_id (stable across<br>configurations for pairing), perturbation_slug, perturbation_layer, and the<br>measured-encoding fields.
Balance
The clean baseline and all 12 lab configurations are uniform at 100 images per cell per<br>label (6 skin tones × 2 genders × real/generated). The 4 platform configurations re-crop the<br>laundered output back to a 256×256 face crop for comparability; all real-side cells retain<br>100/100, while a small fraction of synthetic-face crops do not survive re-detection (~2,342–2,344<br>rows per platform config). That concentration is itself a finding, analyzed in the white paper.
4. How It Is Built (high level)
Source. A deterministic per-cell subset of the v1 Synthetic Face Detection Benchmark:<br>1,200 real Pexels frames and 1,200 SDXL+InstantID outputs, uniform at 100 per cell per...