PII Firewall - Privacy-first LLM applications
GitHubGet started
Open Source · Apache 2.0<br>Privacy firewall<br>for LLM apps<br>Intercept and anonymize PII before it reaches OpenAI, Anthropic, or any LLM — then rehydrate it in the response. Domain-aware, 55+ languages, 3 lines of code .<br>0+<br>Languages
Dispositions
Detection backends
Domain profiles
Quick start GitHub
pii_demo.py
from privacy_firewall import create_firewall
# Domain-aware -- keeps diagnoses, strips PII<br>firewall = create_firewall("healthcare")
result = firewall.process(<br>text="Patient John Doe, SSN 123-45-6789,<br>diagnosed with hypertension.",<br>context={...},
# -> "Patient [PERSON_001], [REDACTED], diagnosed<br># with hypertension."<br># Medical terms preserved. PII stripped.<br>Clinical context preserved<br>How it works<br>Detect · Anonymize · Rehydrate<br>A transparent privacy layer between your app and any LLM. Zero changes to your existing prompt logic.
01<br>Input<br>"Patient Ana Garcia, DNI 12345678A,<br>diagnosed with hypertension."Raw text containing PII arrives from user or upstream service.
02<br>Detect<br>PERSON -- Ana Garcia<br>NATIONAL_ID -- 12345678A<br>DIAGNOSIS -- hypertension (keep)One or more backends (regex, Presidio, GLiNER, Transformers) detect entities. Domain rules decide what to keep.
03<br>Anonymize<br>"Patient [PERSON_001], [REDACTED],<br>diagnosed with hypertension."Entities replaced per their disposition: keep, pseudonymize, redact, generalize, mask, or hash. Profile rules decide which action applies per entity type.
04<br>-> LLM<br>LLM processes sanitized prompt.<br>Real PII never transmitted.Sanitized prompt forwarded to any provider: OpenAI, Anthropic, Mistral, local models. Zero changes to prompt logic.
05<br>Rehydrate<br>"Patient Ana Garcia, DNI 12345678A,<br>diagnosed with hypertension."Vault restores original values in the model's response. End-users see real names - the LLM never did.
01Input<br>"Patient Ana Garcia, DNI 12345678A,<br>diagnosed with hypertension."Raw text containing PII arrives from user or upstream service.
02Detect<br>PERSON -- Ana Garcia<br>NATIONAL_ID -- 12345678A<br>DIAGNOSIS -- hypertension (keep)One or more backends (regex, Presidio, GLiNER, Transformers) detect entities. Domain rules decide what to keep.
03Anonymize<br>"Patient [PERSON_001], [REDACTED],<br>diagnosed with hypertension."Entities replaced per their disposition: keep, pseudonymize, redact, generalize, mask, or hash. Profile rules decide which action applies per entity type.
04-> LLM<br>LLM processes sanitized prompt.<br>Real PII never transmitted.Sanitized prompt forwarded to any provider: OpenAI, Anthropic, Mistral, local models. Zero changes to prompt logic.
05Rehydrate<br>"Patient Ana Garcia, DNI 12345678A,<br>diagnosed with hypertension."Vault restores original values in the model's response. End-users see real names - the LLM never did.
Domain Profiles<br>Built-in presets for your industry<br>Each domain profile decides what's sensitive and what the LLM must see to do its job. Fully customizable.
HealthcareFinanceLegalGenericCustom
Healthcare Profile<br>Keep clinical context. Anonymize patient identifiers and account data.
✓ Keeps (pass-through)<br>• Diagnoses (hipertensión, diabetes)<br>• Medications (enalapril, lisinopril)<br>• Procedures & observations
Transforms<br>ActionEntityExamplePSEUDONYMIZEPERSONAna García → [PERSON_001]REDACTNATIONAL ID12345678A → [REDACTED]GENERALIZEAGE43 años → 40-49GENERALIZEDATE15/03/2024 → 2024REDACTEMAILana@clinic.es → [REDACTED]REDACTIBANES12345678 → [REDACTED]
Live example<br>Input<br>"Paciente Ana García, DNI 12345678A, 43 años,<br>hipertensión. Consulta: 15/03/2024.<br>Email: ana@clinic.es. Prescripción: enalapril 10mg."<br>↓ PII Firewall<br>Output (sanitized)<br>"Paciente [PERSON_001], [REDACTED], 40-49,<br>hipertensión. Consulta: 2024.<br>Email: [REDACTED]. Prescripción: enalapril 10mg."<br>firewall = create_firewall("healthcare")
Detection Backends<br>Mix and match detection engines<br>Start with a preset, then swap in the engine that fits your data. Each card shows the exact install and firewall call.
base<br>Regex
Structured IDs<br>Emails & phones<br>Credit cards<br>Zero ML deps<br>Best for: Zero-dependency environments or fast structured-data pipelines.
Create firewall<br>Regex
pip install "pii-firewall"firewall = create_firewall("healthcare", detector_backend="regex")Customize: add_custom_regex(...)
recommended<br>Presidio<br>50–200 ms
Named entities (persons, orgs)<br>Multi-language NER<br>Best speed/accuracy balance<br>Extensible<br>Best for: General-purpose production workloads with NER requirements.
Create firewall<br>Presidio
pip install "pii-firewall[presidio,langdetect]"firewall = create_firewall("healthcare", detector_backend="presidio")Customize: custom_recognizers=[...]
zero-shot<br>GLiNER<br>100–400 ms
Zero-shot NER<br>No fine-tuning needed<br>Custom entity types on the fly<br>Best for: Custom entity types without labeled training data.
Create firewall<br>GLiNER
pip install "pii-firewall[gliner]"firewall = create_firewall("healthcare", detector_backend="gliner")Customize: define your own entity labels
sector-specific<br>Transformers<br>100–500...