I reorganized 1,388 items in my password manager using AI, without exposing secrets to the model.
That last part is the only reason the experiment was acceptable.
The basic pattern was:
sanitize metadata -> classify with AI -> review uncertain cases -> rehydrate by stable item ID
The model never needed the passwords. It did not need TOTP secrets, recovery codes, notes, or anything sensitive. For classification, metadata was enough: item titles, URLs, rough category hints, and stable IDs.
The stable ID is important. AI can help decide that “github.com” belongs under Development or that a bank login belongs under Finance, but I do not want the model producing the final vault. I want code to do that part. Deterministic code. Boring code. Code that maps a reviewed classification back to the original item by ID.
That separation made the workflow feel safe enough:
- split data first,
- send only sanitized metadata for classification,
- keep secrets local,
- review uncertain results,
- merge back deterministically.
The result was useful. Uncertain classifications dropped from 891 to 485, a 45.6% reduction. More importantly, the process was reproducible. I could inspect what the model suggested before anything touched the real data.
This is the pattern I trust for sensitive AI work.
Use AI for semantic mapping. Use code for safety, determinism, and final writes.
If you mix those responsibilities, you get a clever demo and a security headache. If you separate them, you get something that can actually be used.