Normalized transcript schema
Every agent has its own native log format. An adapter converts that native format into one canonical Transcript. This is the wire format the CLI uploads and the API stores.
- Machine-readable:
schema/transcript.schema.json(JSON Schema, draft 2020-12) - TypeScript:
packages/core/src/transcript.ts - Example:
examples/transcript.example.json
Top-level shape
{
"schemaVersion": "0.1.0",
"transcriptId": "9f1c…", // uuid v4, generated client-side
"source": { … }, // which agent + adapter produced this
"session": { … }, // timing + environment (redacted)
"submitter": { … }, // pseudonymous, key-derived
"privacy": { … }, // what redaction/anonymization ran
"metrics": { … }, // cheap aggregates
"events": [ … ] // the trajectory itself
}source
Identifies the origin so consumers can normalize across agents.
{
"agent": "claude-code", // "claude-code" | "codex" | string
"agentVersion": "1.2.3", // best-effort, may be null
"adapter": "claude-code", // adapter id that produced this
"adapterVersion": "0.1.0"
}session
Timing and (already-redacted) environment context.
{
"startedAt": "2026-06-22T10:00:00Z",
"endedAt": "2026-06-22T10:42:00Z",
"capturedAt":"2026-06-22T10:43:11Z",
"environment": {
"os": "darwin",
"shell": "zsh",
"cwd": "<anonymized>", // hashed/stripped per profile
"repo": { // optional; present only if profile allows
"name": "<anonymized>",
"remote": "<anonymized>",
"commit": "a1b2c3d"
}
}
}submitter
Never contains raw PII. The keyId is the public id of the submission key (not the secret). pseudonym is a stable hash so multiple submissions from the same person can be grouped without revealing identity — unless the profile opts into identity linkage (e.g. the hiring-signal use case).
{
"keyId": "key_2a9f…",
"pseudonym": "anon_7c41…",
"identityLinked": false
}privacy
A receipt of what the client did, so the server and consumers can audit it.
{
"profile": "research",
"redactionApplied": true,
"anonymizationApplied": true,
"rulesApplied": ["api-keys", "aws", "jwt", "emails", "abs-paths"],
"redactionCount": 14,
"contentPolicy": { "includeFileContents": false, "toolOutput": "truncated" }
}metrics
Cheap, non-sensitive aggregates for indexing/sorting.
{
"eventCount": 87,
"messageCount": 24,
"toolCallCount": 41,
"durationMs": 2520000,
"tokens": { "input": 38211, "output": 9120 } // optional
}events — the trajectory
events is an ordered list. Each event is a normalized unit of the session. Heterogeneous agent logs collapse into this small set of types.
{
"id": "ev_001",
"seq": 1, // monotonic ordering within the transcript
"timestamp": "2026-06-22T10:00:03Z",
"type": "tool_call", // see below
"role": "assistant", // "user" | "assistant" | "system" | "tool"
"text": "Running the test suite", // optional human-readable content
"tool": { … }, // present when type is tool_call/tool_result
"redactions": [ … ] // what was scrubbed from this event
}Event type
| type | meaning |
|---|---|
user_message | a turn authored by the human |
assistant_message | natural-language output from the agent |
reasoning | model thinking/scratchpad (included only if profile allows) |
tool_call | the agent invoked a tool (shell, edit, search, MCP, …) |
tool_result | the result returned to the agent |
system | system/instruction content |
meta | adapter-injected annotations (e.g. truncation notice) |
tool
{
"name": "Bash",
"callId": "call_88", // links tool_call ↔ tool_result
"input": { "command": "npm test" },
"output": "…", // shaped by content policy (full/truncated/metadata-only/none)
"status": "ok", // "ok" | "error" | "denied" | "timeout"
"durationMs": 4200
}redactions
Each event records what was removed, so the trajectory stays auditable without exposing the secret. Offsets are into the post-redaction text.
[
{ "field": "tool.output", "ruleId": "aws", "type": "secret", "placeholder": "[REDACTED:aws]" },
{ "field": "text", "ruleId": "emails", "type": "pii", "placeholder": "[REDACTED:email]" }
]Design notes
- Adapters own normalization, not policy. An adapter produces a faithful
Transcript; the redaction layer then mutates it according to the active profile. This keeps "what the agent did" separate from "what we're willing to share." - Stable ids.
callIdlinks calls to results across agents that interleave them differently.seqguarantees ordering even when timestamps collide. - Additive evolution. Unknown fields must be ignored by consumers; new event types are added without a major bump as long as old types keep their meaning.