Question Details

No question body available.

Tags

python artificial-intelligence privacy pii

Answers (3)

April 23, 2026 Score: 2 Rep: 33,868 Quality: Low Completeness: 40%

There are sub-second inference models that run on the CPU that are great for detecting PII, harmful content, jailbreaking and so forth. These are ideal, because you will most likely be able to run them in your current environment, but I'd definitely suggest that you do some further research and check the benchmarks to find the best fit and how easy it would be to flag offending portions, categorize them so you can give heads up to the user about potential privacy leaks.

I'll share a few models, basically you should search for similar model architectures that are CPU-bound:
- https://huggingface.co/DataikuNLP/kiji-pii-model-onnx
- https://huggingface.co/google/shieldgemma-2b

April 23, 2026 Score: 1 Rep: 68 Quality: Low Completeness: 40%

For simple purposes, Regex would be a sensible bet to capture phone numbers and email addresses.

For names, addresses and other personally identifiable information, you'll have a much harder time as this requires a deeper understanding of the text than a regex can handle. You could run a local NER model whose sole purpose is to filter PII (Micro F1 Mask as an example, or a local Ollama or equivalent with system prompt) before the data is passed through to your LLM providers' API.

There are paid providers as well that offer PII protection or NER as a service that you can lean on.

April 23, 2026 Score: 0 Rep: 13,647 Quality: Low Completeness: 0%

See also: Presidio or GLiNER-PII as purpose built PII scrubbing tools