Creating a Full PII Framework for Agents

lamne1 pts0 comments

PII Firewall - Privacy-first LLM applications

GitHubGet started

Open Source · Apache 2.0<br>Privacy firewall<br>for LLM apps<br>Intercept and anonymize PII before it reaches OpenAI, Anthropic, or any LLM — then rehydrate it in the response. Domain-aware, 55+ languages, 3 lines of code .<br>0+<br>Languages

Dispositions

Detection backends

Domain profiles

Quick start GitHub

pii_demo.py

from privacy_firewall import create_firewall

# Domain-aware -- keeps diagnoses, strips PII<br>firewall = create_firewall("healthcare")

result = firewall.process(<br>text="Patient John Doe, SSN 123-45-6789,<br>diagnosed with hypertension.",<br>context={...},

# -> "Patient [PERSON_001], [REDACTED], diagnosed<br># with hypertension."<br># Medical terms preserved. PII stripped.<br>Clinical context preserved<br>How it works<br>Detect · Anonymize · Rehydrate<br>A transparent privacy layer between your app and any LLM. Zero changes to your existing prompt logic.

01<br>Input<br>"Patient Ana Garcia, DNI 12345678A,<br>diagnosed with hypertension."Raw text containing PII arrives from user or upstream service.

02<br>Detect<br>PERSON -- Ana Garcia<br>NATIONAL_ID -- 12345678A<br>DIAGNOSIS -- hypertension (keep)One or more backends (regex, Presidio, GLiNER, Transformers) detect entities. Domain rules decide what to keep.

03<br>Anonymize<br>"Patient [PERSON_001], [REDACTED],<br>diagnosed with hypertension."Entities replaced per their disposition: keep, pseudonymize, redact, generalize, mask, or hash. Profile rules decide which action applies per entity type.

04<br>-> LLM<br>LLM processes sanitized prompt.<br>Real PII never transmitted.Sanitized prompt forwarded to any provider: OpenAI, Anthropic, Mistral, local models. Zero changes to prompt logic.

05<br>Rehydrate<br>"Patient Ana Garcia, DNI 12345678A,<br>diagnosed with hypertension."Vault restores original values in the model's response. End-users see real names - the LLM never did.

01Input<br>"Patient Ana Garcia, DNI 12345678A,<br>diagnosed with hypertension."Raw text containing PII arrives from user or upstream service.

02Detect<br>PERSON -- Ana Garcia<br>NATIONAL_ID -- 12345678A<br>DIAGNOSIS -- hypertension (keep)One or more backends (regex, Presidio, GLiNER, Transformers) detect entities. Domain rules decide what to keep.

03Anonymize<br>"Patient [PERSON_001], [REDACTED],<br>diagnosed with hypertension."Entities replaced per their disposition: keep, pseudonymize, redact, generalize, mask, or hash. Profile rules decide which action applies per entity type.

04-> LLM<br>LLM processes sanitized prompt.<br>Real PII never transmitted.Sanitized prompt forwarded to any provider: OpenAI, Anthropic, Mistral, local models. Zero changes to prompt logic.

05Rehydrate<br>"Patient Ana Garcia, DNI 12345678A,<br>diagnosed with hypertension."Vault restores original values in the model's response. End-users see real names - the LLM never did.

Domain Profiles<br>Built-in presets for your industry<br>Each domain profile decides what's sensitive and what the LLM must see to do its job. Fully customizable.

HealthcareFinanceLegalGenericCustom

Healthcare Profile<br>Keep clinical context. Anonymize patient identifiers and account data.

✓ Keeps (pass-through)<br>• Diagnoses (hipertensión, diabetes)<br>• Medications (enalapril, lisinopril)<br>• Procedures & observations

Transforms<br>ActionEntityExamplePSEUDONYMIZEPERSONAna García → [PERSON_001]REDACTNATIONAL ID12345678A → [REDACTED]GENERALIZEAGE43 años → 40-49GENERALIZEDATE15/03/2024 → 2024REDACTEMAILana@clinic.es → [REDACTED]REDACTIBANES12345678 → [REDACTED]

Live example<br>Input<br>"Paciente Ana García, DNI 12345678A, 43 años,<br>hipertensión. Consulta: 15/03/2024.<br>Email: ana@clinic.es. Prescripción: enalapril 10mg."<br>↓ PII Firewall<br>Output (sanitized)<br>"Paciente [PERSON_001], [REDACTED], 40-49,<br>hipertensión. Consulta: 2024.<br>Email: [REDACTED]. Prescripción: enalapril 10mg."<br>firewall = create_firewall("healthcare")

Detection Backends<br>Mix and match detection engines<br>Start with a preset, then swap in the engine that fits your data. Each card shows the exact install and firewall call.

base<br>Regex

Structured IDs<br>Emails & phones<br>Credit cards<br>Zero ML deps<br>Best for: Zero-dependency environments or fast structured-data pipelines.

Create firewall<br>Regex

pip install "pii-firewall"firewall = create_firewall("healthcare", detector_backend="regex")Customize: add_custom_regex(...)

recommended<br>Presidio<br>50–200 ms

Named entities (persons, orgs)<br>Multi-language NER<br>Best speed/accuracy balance<br>Extensible<br>Best for: General-purpose production workloads with NER requirements.

Create firewall<br>Presidio

pip install "pii-firewall[presidio,langdetect]"firewall = create_firewall("healthcare", detector_backend="presidio")Customize: custom_recognizers=[...]

zero-shot<br>GLiNER<br>100–400 ms

Zero-shot NER<br>No fine-tuning needed<br>Custom entity types on the fly<br>Best for: Custom entity types without labeled training data.

Create firewall<br>GLiNER

pip install "pii-firewall[gliner]"firewall = create_firewall("healthcare", detector_backend="gliner")Customize: define your own entity labels

sector-specific<br>Transformers<br>100–500...

firewall hypertension patient diagnosed redacted domain

Related Articles