Healthcare AI Privacy Research Patent Pending

PHI Exposure Guard
Stateful De-Identification
for Streaming Data

Re-identification risk accumulates over time. Most de-identification pipelines ignore that. This system tracks cumulative exposure across modalities and adjusts masking strength dynamically.

Static masking is the wrong model

Standard de-identification pipelines treat every record as isolated. Detect PHI, remove it, move on. That works for single documents. It breaks down in streaming systems where the same patient appears across hundreds of events over time: clinical notes, ASR transcripts, imaging metadata, waveform headers.

Existing approach

Per-document masking. No memory of prior events. No model of cumulative risk. Risk accumulates invisibly across the stream.

This system

Stateful exposure tracking. Rolling risk computation across modalities and time. Masking strength proportional to actual accumulated risk.

Five policies evaluated

The adaptive controller achieves full utility while keeping leakage close to the redact floor, without the utility collapse that full redaction causes. All results generated from fully synthetic data.

Policy Leak Total Utility Proxy Mean Latency (ms) P90 Latency (ms)
raw3.031.00.1230.154
weak2.00.510.1410.18
pseudo0.511.00.1590.188
redact0.510.510.1570.192
adaptive ★0.561.01.161.237
Privacy-Utility Tradeoff

Modular pipeline

Each module handles a distinct layer of the exposure-aware masking pipeline.

state
Context State Per-subject exposure state persisted via SQLite
ctrl
Controller Risk scoring and adaptive policy selection
dcpg
DCPG Dynamic Context Persistence Graph for cross-modal linkage
crdt
Federated Graph CRDT-based merging for distributed deployments
rl
PPO Agent Reinforcement learning agent for adaptive policy control
audit
Audit Signing Cryptographic signing and FHIR export of audit records
cmo
CMO Registry Composable Masking Operators with DAG execution
flow
Flow Controller DAG-based policy flow with audit provenance

Try the scorer

Add events for a patient across modalities and watch risk accumulate in real time. Try adding the same patient ID with different modalities to trigger cross-modal detection.

Run it yourself

Run the full benchmark locally. Results are written to the results/ directory.

# Install
pip install phi-exposure-guard

# Run the benchmark
python -m amphi_rl_dpgraph.run_demo

Or open the Colab notebook — no setup required:

▶ Open in Colab

Using this work

If you use this system or dataset in academic or technical work, please cite via the CITATION.cff file in the GitHub repository.

This repository is associated with a U.S. provisional patent application filed 2025-07-05. Public release: 2026-03-02. All experiments run on fully synthetic data only.