Redactr API

Redactr API lets developers embed full-depth PDF redaction into their own software.

The problem

Most PDF redaction tools draw a black rectangle over text. It looks redacted, but the underlying data is still in the file. The text content, font references, and positioning information all remain in the PDF structure. Anyone with a basic text extractor can recover the original content.

This isn't a theoretical risk. Government agencies, law firms, and corporations have all published "redacted" documents where the sensitive text was trivially recoverable. For GDPR compliance, that's not redaction — it's a data breach waiting to happen.

What Redactr does differently

Redactr removes sensitive data at the PDF object level. Rather than overlaying graphics, it identifies the text operations in the document's content stream and removes them entirely. The redacted content doesn't exist anywhere in the output file — there's nothing to extract because the data has been deleted, not covered up.

Redactr also uses AI to suggest which entities in a document might need redaction — names, addresses, financial details — so teams aren't relying entirely on manual review to catch sensitive data.

How it's built

Redactr API is a Laravel application that orchestrates two gRPC microservices: a Python/PyMuPDF service for PDF processing, and a suggestions service for sensitive data identification. AWS Bedrock handles LLM-based suggestions.

The architecture is designed around zero data retention. Document content is processed in memory and never written to disk or stored in the application's database. I wrote about the thinking behind that in Zero data retention as an architecture principle.

Why I'm building it

Plenty of software needs to handle sensitive PDFs — case management, HR, legal tech, healthcare, anywhere regulated data ends up in a document. The current options are expensive enterprise tools, manual processes built around consumer PDF editors, or open-source libraries you assemble yourself. None of them is a clean, integration-ready primitive that actually removes the data. Redactr API is. What customers build on top of it is up to them.

It's also a genuinely interesting engineering problem. Building a privacy-first document processing pipeline with AI suggestions, polyglot microservices, and zero data retention has pushed me into areas I wouldn't have explored otherwise.

The problem ​

What Redactr does differently ​

How it's built ​

Why I'm building it ​

The problem

What Redactr does differently

How it's built

Why I'm building it