Privacy
Privacy in AgentSync is local-first and configurable per use case. Nothing leaves the submitter's machine until it has passed through the active privacy profile and (by default) a human preview.
TypeScript model: packages/core/src/privacy.ts Example config: examples/agentsync.config.example.json
The profile model
A profile is a named bundle of privacy rules. Different use cases want different trade-offs, so AgentSync ships several built-ins and lets users define their own. The active profile is chosen by:
agentsync submit --profile strict # explicit flag
AGENTSYNC_PROFILE=research # env
defaultProfile in agentsync.config.json # configProfiles compose via extends, so a custom profile can inherit strict and relax one field.
Built-in profiles (starting point)
| Profile | Intended use case | Posture |
|---|---|---|
strict | Default / unsure | Redact all secret + PII detectors, anonymize all identifiers, drop file contents, tool output → metadata-only, no reasoning. |
research | Trajectory corpora | Redact secrets + PII, anonymize identifiers, keep truncated tool output and reasoning, drop raw file contents. |
hiring-signal | Portfolio / hiring | Redact secrets, keep file contents and full tool output, identity-linked (anonymize: false) so work attributes to the person. |
raw-min | Self-hosting / trusted sink | Redact only high-confidence secrets; everything else passes through. |
These are defaults, not laws. The whole point of the config is that each deployment / use case picks its own posture.
A profile's four levers
{
"name": "research",
"extends": "strict", // optional inheritance
"redaction": { … }, // 1. remove secrets & PII
"anonymization": { … }, // 2. de-identify stable identifiers
"content": { … }, // 3. how much of the payload to include
"scope": { … } // 4. which sessions/paths are eligible
}1. redaction — secrets & PII (client-side, always on by default)
The user chose client-side secret redaction as the baseline. Detectors run over every event's text and tool I/O before upload.
{
"enabled": true,
"builtins": ["api-keys","aws","gcp","jwt","private-keys","env-values",
"emails","ip-addresses","credit-cards"],
"custom": [
{ "id": "internal-host", "pattern": "\\bcorp\\.example\\.com\\b", "type": "pii" }
],
"replacement": "[REDACTED:{type}]",
"onUncertain": "redact" // "redact" | "flag" | "ignore"
}- builtins — named detector packs (regex + entropy + context heuristics).
- custom — user-supplied regex for org-specific secrets/hostnames.
- onUncertain — what to do with medium-confidence hits;
strictredacts.
2. anonymization — de-identify, don't delete
Replaces stable identifiers with deterministic pseudonyms so trajectory structure survives while identity is protected.
{
"enabled": true,
"fields": ["username","hostname","abs-paths","repo-name","git-remote"],
"strategy": "hash", // "hash" → stable pseudonym | "strip" → remove
"salt": "per-submitter" // "per-submitter" | "fixed" | "random-per-run"
}salt: per-submitterlets a consumer group one person's submissions without ever learning who they are.- The
hiring-signalprofile setsenabled: falseso work is attributable.
3. content — how much payload to include
The privacy/utility dial. Strip expensive or sensitive material entirely.
{
"includeFileContents": false,
"toolOutput": "truncated", // "full" | "truncated" | "metadata-only" | "none"
"includeReasoning": true,
"maxToolOutputBytes": 16384
}4. scope — what's even eligible to submit
Path/repo gating so whole categories of work never get considered.
{
"includeRepos": ["github.com/me/*"],
"excludePaths": ["**/secrets/**","**/.env*","/work/**"],
"denyIfMatch": ["NDA","CONFIDENTIAL"] // skip a session if these appear
}The preview gate
Even with redaction on, the default flow shows a preview of the exact payload and requires confirmation:
agentsync submit # interactive preview + confirm
agentsync submit --yes # CI / non-interactive (still redacts)
agentsync submit --dry-run # write redacted payload locally, upload nothing--dry-run is the recommended way to inspect what a profile produces before trusting it.
Server-side defense-in-depth
The client is the privacy authority, but the ingestion API independently re-scans for high-confidence secrets and rejects or quarantines anything that still trips them. A leaked secret should fail closed at two layers, not one. The server never sees pre-redaction data.
Roadmap (not in v0, but the model leaves room for them)
The four levers above cover the use cases you named. Future, heavier options the schema is designed to accommodate:
- End-to-end encryption — encrypt the payload to consumer-held keys so the store holds only ciphertext.
- Aggregate-only / differential privacy — submit derived statistics instead of raw trajectories for sensitive orgs.
- Self-hosted sink — point the CLI at your own ingestion URL; data never touches shared infrastructure.