Uses the OpenAI Privacy Filter model to detect and redact personally identifiable information (PII) from text.
Plain [REDACTED] placeholders lose all information about which PII values are the same. Using hash(salt | pii_data) instead:
- Consistent identifiers: The same PII always maps to the same hash, enabling cross-document correlation (e.g., "how many documents mention the same person?")
- Reversible with salt: With the salt, you can recompute hashes to identify original PII if needed
- Salt prevents rainbow table attacks: Without a salt, hashes could be precomputed for common names/emails to reverse-identify PII from redacted text
pip install pii-safefrom pii_safe import redact_text
text = "mi nombre es Dario Clavijo"
redacted = redact_text(text)
print(redacted) # mi nombre es[REDACTED_<hash>]By default, a random 64-character salt is generated at startup. You can specify a salt to ensure consistent hashing across runs:
from pii_safe import redact_text, set_salt
# Option 1: Pass salt to redact_text
redacted = redact_text("mi nombre es Dario Clavijo", salt="my_secret_salt")
# Option 2: Set salt globally
set_salt("my_secret_salt")
redacted = redact_text("mi nombre es Dario Clavijo")from pii_safe import Redacter
redacter = Redacter(salt="my_secret_salt")
result1 = redacter.redact("mi nombre es Dario Clavijo")
result2 = redacter.redact("el es Dario Clavijo")
# Same PII gets consistent hash within this instance
hash_map = redacter.get_hash_map()
print(hash_map) # {' Dario Clavijo': '<hash>'}pii-safe input.txt
pii-safe input.txt -o output.txt
pii-safe input.txt --salt my_secret_saltgit clone https://github.com/daedalus/pii-safe.git
cd pii-safe
pip install -e ".[test]"
# run tests
pytest
# format
ruff format src/ tests/
# lint
ruff check src/ tests/
# type check
mypy src/Redacts PII from text using the openai/privacy-filter model.
Set the salt for hashing PII in the default redacter.
Context manager for consistent PII-to-hash mapping across calls.
__init__(salt: str | None = None): Initialize with optional saltredact(text: str) -> str: Redact PII from textget_hash_map() -> dict[str, str]: Get PII-to-hash mapping