A dependency-free Go module for auditing, detecting, removing, and substituting emoji clutter and redundant whitespace in text content before it reaches production. Use it as a post-processing step after AI agent output, as a content gate in your request pipeline, or as a CI quality gate -- one call to Sanitize strips and normalizes in a single pass, Replace maps emoji to meaningful text equivalents, ScanDir audits entire directory trees (it calls ContainsEmoji internally per file), and ContainsEmoji is available directly for ad-hoc single-string detection.
- Emoji removal -- strips all emoji and pictographic codepoints using compiled Unicode range tables; ZWJ sequences, variation selectors, and tag characters handled correctly
- Whitespace normalization -- collapses redundant inline spaces and blank lines while preserving leading indentation
- Configurable pipeline --
Sanitizeruns removal and normalization in one call;AllowedRangesandAllowedEmojislet callers preserve specific codepoints - Substitution --
Replacemaps ~137 built-in emoji to readable text equivalents (e.g.,[PASS],[FAIL]); custom maps supported - Metrics --
SanitizeReportreturns emoji count removed and bytes saved alongside the cleaned text - Streaming --
SanitizeReaderprocessesio.Readerline by line; supports lines up to 1 MiB - JSON-aware --
SanitizeJSONcleans string values only; preserves keys, numbers, booleans, null, and numeric precision - Directory scanner --
ScanDir/ScanDirContextwalk an entire tree and return per-file findings; cancellation supported - Atomic writes --
SanitizeFile,ReplaceFile,WriteFinding, andFixDirwrite through a temp file and rename; partial writes cannot corrupt the original - CLI --
cmd/demojifysupports audit, strip (-fix), substitute (-sub), normalize, quiet mode, extension filter, and directory skip - Zero external dependencies -- pure stdlib; no
go.sumrequired
go get github.com/nicholashoule/demojify-sanitizego install github.com/nicholashoule/demojify-sanitize/cmd/demojify@latestimport demojify "github.com/nicholashoule/demojify-sanitize"
// Remove all emojis and normalize whitespace in one call.
clean := demojify.Sanitize(text, demojify.DefaultOptions())A ready-to-run CLI example lives in cmd/demojify/main.go.
It audits a directory tree for emoji, reports every occurrence with file, line,
and column, and optionally rewrites affected files (-fix) or substitutes emoji
with text tokens (-sub). Use -skip to exclude specific directories
(e.g., dist, build) in addition to the defaults.
# Build once, then run the binary:
go build -o demojify ./cmd/demojify
./demojify -root . -sub -skip dist,buildclean := demojify.Sanitize(aiResponse, demojify.DefaultOptions())if demojify.ContainsEmoji(userInput) {
userInput = demojify.Sanitize(userInput, demojify.DefaultOptions())
}cfg := demojify.DefaultScanConfig()
findings, _ := demojify.ScanDir(cfg)
for _, f := range findings {
fmt.Printf("%s: has_emoji=%v\n", f.Path, f.HasEmoji)
}cfg := demojify.DefaultScanConfig()
fixed, _, err := demojify.FixDir(".", cfg)
fmt.Printf("fixed %d file(s)\n", fixed)repl := demojify.DefaultReplacements()
clean := demojify.Replace("\u2705 tests passed, \u274c build failed", repl)
// "[PASS] tests passed, [FAIL] build failed"Option A -- pre-built binary (CI, minimal setup):
go build -o .git/hooks/demojify ./cmd/demojify#!/bin/sh
# .git/hooks/pre-commit
root="$(git rev-parse --show-toplevel)"
"$root/.git/hooks/demojify" -root "$root" -exts .go,.md -quietOption B -- go run with repogov governance (recommended for in-repo hooks):
This is the pattern used in scripts/hooks/pre-commit in this repository.
Repogov enforces line limits and layout rules; demojify blocks emoji.
Both tools run from their published module versions -- no local clone required.
#!/bin/sh
# .git/hooks/pre-commit
root="$(git rev-parse --show-toplevel)"
cd "$root"
go run github.com/nicholashoule/repogov/cmd/repogov@v0.3.0 -root "$root" -agent copilot
repogov_exit=$?
go run github.com/nicholashoule/demojify-sanitize/cmd/demojify@v0.4.0 -root "$root"
demojify_exit=$?
exit $((repogov_exit | demojify_exit))See docs/git-hooks.md for auto-fix, substitution, the full Go API variant, and cross-platform (macOS/Linux/Windows) examples.
Process LLM token streams or HTTP chunked responses line by line without buffering the full input:
var out bytes.Buffer
err := demojify.SanitizeReader(llmStream, &out, demojify.DefaultOptions())Lines up to 1 MiB are supported. Longer lines return bufio.ErrTooLong.
Clean string values inside a JSON document while leaving keys, numbers, booleans, and null untouched:
clean, err := demojify.SanitizeJSON(jsonBytes, demojify.DefaultOptions())Returns an error for invalid JSON and for input with trailing non-whitespace
content after the first value (e.g., {"a":1} trailing).
See example_test.go for additional runnable patterns (HTTP handler, pre-commit/CI, file write-back, per-occurrence matching).
Full signatures and doc comments are on pkg.go.dev.
| Function | Purpose |
|---|---|
Sanitize(text, opts) string |
Configurable pipeline: emoji removal then whitespace normalization |
SanitizeFile(path, opts) (bool, error) |
Sanitize a file atomically; no write when clean |
Demojify(text) string |
Strip all emoji / pictographic codepoints |
ContainsEmoji(text) bool |
Detect emoji presence |
CountEmoji(text) int |
Count emoji codepoint occurrences |
BytesSaved(text) int |
Bytes freed by emoji removal |
Normalize(text) string |
Collapse redundant whitespace (preserves leading indentation) |
TechnicalSymbolRanges() []*unicode.RangeTable |
Pre-built ranges for check marks, gears, etc. -- pass to AllowedRanges |
| Function / Type | Purpose |
|---|---|
SanitizeReport(text, opts) SanitizeResult |
Sanitize with structured metrics (emoji count, bytes saved) |
SanitizeResult |
Cleaned text plus EmojiRemoved and BytesSaved fields |
SanitizeReader(r, w, opts) error |
Line-by-line streaming sanitization (LLM streams, MCP payloads) |
SanitizeJSON(data, opts) ([]byte, error) |
Sanitize JSON string values only; preserves structure and numeric precision |
| Function | Purpose |
|---|---|
Replace(text, repl) string |
Map emoji to text equivalents; strip unmapped remainder |
ReplaceFile(path, repl) (int, error) |
Atomic in-place replacement; no write when clean |
ReplaceCount(text, repl) (string, int) |
Replace and return substitution count |
FindAll(text) []string |
Distinct emoji sequences in text |
FindAllMapped(text, repl) []string |
Mapped keys found in text |
DefaultReplacements() map[string]string |
Built-in ~137-entry emoji-to-text map (full list) |
| Function / Type | Purpose |
|---|---|
ScanDir(cfg) ([]Finding, error) |
Walk directory tree, return findings |
ScanDirContext(ctx, cfg) ([]Finding, error) |
Context-aware scan with cancellation support |
ScanFile(path, opts) (*Finding, error) |
Check a single file |
FindMatchesInFile(path, repl) ([]Match, error) |
Per-occurrence match detail (line, column, context) |
WriteFinding(path, f) (bool, error) |
Atomic write-back without re-reading |
FixDir(root, cfg) (fixed, clean int, err error) |
Scan and fix an entire directory tree in one call |
ScanConfig / DefaultScanConfig() |
Scanner configuration (root, skip dirs, extensions, etc.) |
Finding |
Path, HasEmoji, Original, Cleaned, Matches |
Match |
Sequence, Replacement, Line, Column, Context |
| Symbol | Purpose |
|---|---|
LimitConfig |
Per-file line limit struct: Default int + Files map override |
DefaultLimitConfig() LimitConfig |
Returns a pre-populated config (500-line default; .claude/CLAUDE.md capped at 50) |
DefaultLineLimit |
Fallback constant (500) when LimitConfig.Default is zero |
ResolveLimit(cfg LimitConfig, path string) int |
Returns the effective line limit for path (file override → Default → DefaultLineLimit) |
type Options struct {
RemoveEmojis bool // strip emoji / pictographic characters
NormalizeWhitespace bool // collapse redundant spaces and blank lines
AllowedRanges []*unicode.RangeTable // preserve emoji in these Unicode ranges
AllowedEmojis []string // preserve specific emoji strings (exact match)
}
func DefaultOptions() Options // RemoveEmojis + NormalizeWhitespace = trueAllowedRanges and AllowedEmojis can be combined. Empty strings in
AllowedEmojis and empty keys in replacement maps are silently skipped.
// Remove all emoji except rocket and thumbs-up.
clean := demojify.Sanitize(text, demojify.Options{
RemoveEmojis: true,
AllowedEmojis: []string{"\U0001F680", "\U0001F44D"},
})Demojify strips U+2139, U+2600-U+27BF, U+1F000-U+1FAFF, ZWJ (U+200D),
variation selectors (U+FE00-U+FE0F), tag characters (U+E0020-U+E007F), and
related auxiliary ranges. Intentionally not removed: copyright, registered,
trademark, and basic math/technical arrows.
Full range table: docs/unicode-coverage.md.
| Document | Contents |
|---|---|
| docs/design.md | Architecture rationale: zero-dependency policy, pipeline order, error handling, atomic writes |
| docs/replacements.md | Full DefaultReplacements() reference: all ~137 entries organized by category |
| docs/unicode-coverage.md | emojiRE ranges, intentional exclusions (copyright, trademark, math arrows), substitution vs. stripping |
| docs/cli.md | cmd/demojify CLI reference: flags, exit codes, output format, examples |
| docs/git-hooks.md | Pre-commit hook integration: shell and Go examples, auto-fix, substitution |
See LICENSE.