Normalized transcript schema

Every agent has its own native log format. An adapter converts that native format into one canonical Transcript. This is the wire format the CLI uploads and the API stores.

Machine-readable: schema/transcript.schema.json (JSON Schema, draft 2020-12)
TypeScript: packages/core/src/transcript.ts
Example: examples/transcript.example.json

Top-level shape

jsonc

{
  "schemaVersion": "0.1.0",
  "transcriptId": "9f1c…",          // uuid v4, generated client-side
  "source": { … },                   // which agent + adapter produced this
  "session": { … },                  // timing + environment (redacted)
  "submitter": { … },                // pseudonymous, key-derived
  "privacy": { … },                  // what redaction/anonymization ran
  "metrics": { … },                  // cheap aggregates
  "events": [ … ]                    // the trajectory itself
}

`source`

Identifies the origin so consumers can normalize across agents.

jsonc

{
  "agent": "claude-code",            // "claude-code" | "codex" | string
  "agentVersion": "1.2.3",           // best-effort, may be null
  "adapter": "claude-code",          // adapter id that produced this
  "adapterVersion": "0.1.0"
}

`session`

Timing and (already-redacted) environment context.

jsonc

{
  "startedAt": "2026-06-22T10:00:00Z",
  "endedAt":   "2026-06-22T10:42:00Z",
  "capturedAt":"2026-06-22T10:43:11Z",
  "environment": {
    "os": "darwin",
    "shell": "zsh",
    "cwd": "<anonymized>",           // hashed/stripped per profile
    "repo": {                        // optional; present only if profile allows
      "name": "<anonymized>",
      "remote": "<anonymized>",
      "commit": "a1b2c3d"
    }
  }
}

`submitter`

Never contains raw PII. The keyId is the public id of the submission key (not the secret). pseudonym is a stable hash so multiple submissions from the same person can be grouped without revealing identity — unless the profile opts into identity linkage (e.g. the hiring-signal use case).

jsonc

{
  "keyId": "key_2a9f…",
  "pseudonym": "anon_7c41…",
  "identityLinked": false
}

`privacy`

A receipt of what the client did, so the server and consumers can audit it.

jsonc

{
  "profile": "research",
  "redactionApplied": true,
  "anonymizationApplied": true,
  "rulesApplied": ["api-keys", "aws", "jwt", "emails", "abs-paths"],
  "redactionCount": 14,
  "contentPolicy": { "includeFileContents": false, "toolOutput": "truncated" }
}

`metrics`

Cheap, non-sensitive aggregates for indexing/sorting.

jsonc

{
  "eventCount": 87,
  "messageCount": 24,
  "toolCallCount": 41,
  "durationMs": 2520000,
  "tokens": { "input": 38211, "output": 9120 }   // optional
}

`events` — the trajectory

events is an ordered list. Each event is a normalized unit of the session. Heterogeneous agent logs collapse into this small set of types.

jsonc

{
  "id": "ev_001",
  "seq": 1,                          // monotonic ordering within the transcript
  "timestamp": "2026-06-22T10:00:03Z",
  "type": "tool_call",              // see below
  "role": "assistant",             // "user" | "assistant" | "system" | "tool"
  "text": "Running the test suite", // optional human-readable content
  "tool": { … },                    // present when type is tool_call/tool_result
  "redactions": [ … ]               // what was scrubbed from this event
}

Event `type`

type	meaning
`user_message`	a turn authored by the human
`assistant_message`	natural-language output from the agent
`reasoning`	model thinking/scratchpad (included only if profile allows)
`tool_call`	the agent invoked a tool (shell, edit, search, MCP, …)
`tool_result`	the result returned to the agent
`system`	system/instruction content
`meta`	adapter-injected annotations (e.g. truncation notice)

`tool`

jsonc

{
  "name": "Bash",
  "callId": "call_88",              // links tool_call ↔ tool_result
  "input": { "command": "npm test" },
  "output": "…",                    // shaped by content policy (full/truncated/metadata-only/none)
  "status": "ok",                   // "ok" | "error" | "denied" | "timeout"
  "durationMs": 4200
}

`redactions`

Each event records what was removed, so the trajectory stays auditable without exposing the secret. Offsets are into the post-redaction text.

jsonc

[
  { "field": "tool.output", "ruleId": "aws", "type": "secret", "placeholder": "[REDACTED:aws]" },
  { "field": "text",        "ruleId": "emails", "type": "pii",  "placeholder": "[REDACTED:email]" }
]

Design notes

Adapters own normalization, not policy. An adapter produces a faithful Transcript; the redaction layer then mutates it according to the active profile. This keeps "what the agent did" separate from "what we're willing to share."
Stable ids. callId links calls to results across agents that interleave them differently. seq guarantees ordering even when timestamps collide.
Additive evolution. Unknown fields must be ignored by consumers; new event types are added without a major bump as long as old types keep their meaning.

Normalized transcript schema ​

Top-level shape ​

source ​

session ​

submitter ​

privacy ​

metrics ​

events — the trajectory ​

Event type ​

tool ​

redactions ​

Design notes ​

Normalized transcript schema

Top-level shape

`source`

`session`

`submitter`

`privacy`

`metrics`

`events` — the trajectory

Event `type`

`tool`

`redactions`

Design notes