Skip to content

Normalized transcript schema

Every agent has its own native log format. An adapter converts that native format into one canonical Transcript. This is the wire format the CLI uploads and the API stores.

Top-level shape

jsonc
{
  "schemaVersion": "0.1.0",
  "transcriptId": "9f1c…",          // uuid v4, generated client-side
  "source": {  },                   // which agent + adapter produced this
  "session": {  },                  // timing + environment (redacted)
  "submitter": {  },                // pseudonymous, key-derived
  "privacy": {  },                  // what redaction/anonymization ran
  "metrics": {  },                  // cheap aggregates
  "events": [  ]                    // the trajectory itself
}

source

Identifies the origin so consumers can normalize across agents.

jsonc
{
  "agent": "claude-code",            // "claude-code" | "codex" | string
  "agentVersion": "1.2.3",           // best-effort, may be null
  "adapter": "claude-code",          // adapter id that produced this
  "adapterVersion": "0.1.0"
}

session

Timing and (already-redacted) environment context.

jsonc
{
  "startedAt": "2026-06-22T10:00:00Z",
  "endedAt":   "2026-06-22T10:42:00Z",
  "capturedAt":"2026-06-22T10:43:11Z",
  "environment": {
    "os": "darwin",
    "shell": "zsh",
    "cwd": "<anonymized>",           // hashed/stripped per profile
    "repo": {                        // optional; present only if profile allows
      "name": "<anonymized>",
      "remote": "<anonymized>",
      "commit": "a1b2c3d"
    }
  }
}

submitter

Never contains raw PII. The keyId is the public id of the submission key (not the secret). pseudonym is a stable hash so multiple submissions from the same person can be grouped without revealing identity — unless the profile opts into identity linkage (e.g. the hiring-signal use case).

jsonc
{
  "keyId": "key_2a9f…",
  "pseudonym": "anon_7c41…",
  "identityLinked": false
}

privacy

A receipt of what the client did, so the server and consumers can audit it.

jsonc
{
  "profile": "research",
  "redactionApplied": true,
  "anonymizationApplied": true,
  "rulesApplied": ["api-keys", "aws", "jwt", "emails", "abs-paths"],
  "redactionCount": 14,
  "contentPolicy": { "includeFileContents": false, "toolOutput": "truncated" }
}

metrics

Cheap, non-sensitive aggregates for indexing/sorting.

jsonc
{
  "eventCount": 87,
  "messageCount": 24,
  "toolCallCount": 41,
  "durationMs": 2520000,
  "tokens": { "input": 38211, "output": 9120 }   // optional
}

events — the trajectory

events is an ordered list. Each event is a normalized unit of the session. Heterogeneous agent logs collapse into this small set of types.

jsonc
{
  "id": "ev_001",
  "seq": 1,                          // monotonic ordering within the transcript
  "timestamp": "2026-06-22T10:00:03Z",
  "type": "tool_call",              // see below
  "role": "assistant",             // "user" | "assistant" | "system" | "tool"
  "text": "Running the test suite", // optional human-readable content
  "tool": {  },                    // present when type is tool_call/tool_result
  "redactions": [  ]               // what was scrubbed from this event
}

Event type

typemeaning
user_messagea turn authored by the human
assistant_messagenatural-language output from the agent
reasoningmodel thinking/scratchpad (included only if profile allows)
tool_callthe agent invoked a tool (shell, edit, search, MCP, …)
tool_resultthe result returned to the agent
systemsystem/instruction content
metaadapter-injected annotations (e.g. truncation notice)

tool

jsonc
{
  "name": "Bash",
  "callId": "call_88",              // links tool_call ↔ tool_result
  "input": { "command": "npm test" },
  "output": "…",                    // shaped by content policy (full/truncated/metadata-only/none)
  "status": "ok",                   // "ok" | "error" | "denied" | "timeout"
  "durationMs": 4200
}

redactions

Each event records what was removed, so the trajectory stays auditable without exposing the secret. Offsets are into the post-redaction text.

jsonc
[
  { "field": "tool.output", "ruleId": "aws", "type": "secret", "placeholder": "[REDACTED:aws]" },
  { "field": "text",        "ruleId": "emails", "type": "pii",  "placeholder": "[REDACTED:email]" }
]

Design notes

  • Adapters own normalization, not policy. An adapter produces a faithful Transcript; the redaction layer then mutates it according to the active profile. This keeps "what the agent did" separate from "what we're willing to share."
  • Stable ids. callId links calls to results across agents that interleave them differently. seq guarantees ordering even when timestamps collide.
  • Additive evolution. Unknown fields must be ignored by consumers; new event types are added without a major bump as long as old types keep their meaning.

Released under the MIT License.