Skip to content

daedalus/pii-safe

Repository files navigation

pii-safe — Redact PII from text

PyPI Python Ruff

Uses the OpenAI Privacy Filter model to detect and redact personally identifiable information (PII) from text.

Why hash-based redaction?

Plain [REDACTED] placeholders lose all information about which PII values are the same. Using hash(salt | pii_data) instead:

  • Consistent identifiers: The same PII always maps to the same hash, enabling cross-document correlation (e.g., "how many documents mention the same person?")
  • Reversible with salt: With the salt, you can recompute hashes to identify original PII if needed
  • Salt prevents rainbow table attacks: Without a salt, hashes could be precomputed for common names/emails to reverse-identify PII from redacted text

Install

pip install pii-safe

Usage

from pii_safe import redact_text

text = "mi nombre es Dario Clavijo"
redacted = redact_text(text)
print(redacted)  # mi nombre es[REDACTED_<hash>]

Salt for hashing

By default, a random 64-character salt is generated at startup. You can specify a salt to ensure consistent hashing across runs:

from pii_safe import redact_text, set_salt

# Option 1: Pass salt to redact_text
redacted = redact_text("mi nombre es Dario Clavijo", salt="my_secret_salt")

# Option 2: Set salt globally
set_salt("my_secret_salt")
redacted = redact_text("mi nombre es Dario Clavijo")

Using the Redacter class

from pii_safe import Redacter

redacter = Redacter(salt="my_secret_salt")
result1 = redacter.redact("mi nombre es Dario Clavijo")
result2 = redacter.redact("el es Dario Clavijo")

# Same PII gets consistent hash within this instance
hash_map = redacter.get_hash_map()
print(hash_map)  # {' Dario Clavijo': '<hash>'}

CLI

pii-safe input.txt
pii-safe input.txt -o output.txt
pii-safe input.txt --salt my_secret_salt

Development

git clone https://github.com/daedalus/pii-safe.git
cd pii-safe
pip install -e ".[test]"

# run tests
pytest

# format
ruff format src/ tests/

# lint
ruff check src/ tests/

# type check
mypy src/

API

redact_text(text: str, salt: str | None = None) -> str

Redacts PII from text using the openai/privacy-filter model.

set_salt(salt: str) -> None

Set the salt for hashing PII in the default redacter.

class Redacter

Context manager for consistent PII-to-hash mapping across calls.

  • __init__(salt: str | None = None): Initialize with optional salt
  • redact(text: str) -> str: Redact PII from text
  • get_hash_map() -> dict[str, str]: Get PII-to-hash mapping

Packages

 
 
 

Contributors

Languages