Dozor

How it works

A short walkthrough of what actually happens when the SDK is wired into your product. Aimed at the engineer evaluating "does this fit my mental model?" before committing to install or self-host.

For the under-the-hood code-level deep-dive, see Resources → Ingest pipeline.

The two pieces

Two sides talk over one HTTP endpoint:

  • Browser side — your product imports @kharko/dozor. It uses rrweb to capture DOM mutations, mouse positions, scrolls, and input events.
  • Server side — this codebase. It exposes POST /api/ingest, validates the batch, stores it in Postgres, and serves the Replay UI.

Between them: gzipped JSON over HTTPS. No broker, no queue, no third-party processor — you own both ends.

What the SDK does

When you call Dozor.init({ apiKey, endpoint }) in your app:

  1. rrweb starts capturing — DOM mutations (every node added, removed, attribute changed), mouse positions, scroll offsets, input events.
  2. Events accumulate in a buffer — in-memory array, capped at 10,000 events.
  3. Inline markers track navigation + identity — every SPA pathname change emits a dozor:url rrweb custom event into the stream (hash-only and query-only changes ignored). dozor.identify() emits a dozor:identity marker. The dashboard slices the stream client-side at read time using these anchors.
  4. Privacy filters apply at source — fields with data-dozor-mask have their text replaced with asterisks before the event is buffered. Masked content never enters the network.
  5. Eager bootstrap flush — right after start(), the SDK ships the initial Meta + FullSnapshot pair so the replayer-seed events reach the server within ~1 s, independent of the periodic timer.
  6. Periodic flush — every 60 s (or sooner: tab background, page unload, buffer fill), the SDK gzips the buffer and POSTs to your ingest endpoint.

See SDK reference for the per-package detail.

What the dashboard does on receive

A batch arrives at POST /api/ingest:

  1. Auth checkwithPublicKey HOF validates the X-Dozor-Public-Key header against the Project.key column. Unknown key → 401.
  2. Decompress + validate — gzip is unwrapped if present, JSON is parsed, the payload is validated against a Zod schema. Malformed batch → 400.
  3. Upsert the session row — keyed on (projectId, externalId). The same sessionId from the SDK upserts to the same row, so batches for one session converge. On the call that creates the row, the pipeline also synthesises an initial Marker(kind="url") from metadata.url — every session gets at least one timeline anchor.
  4. Insert the EventBatch — one row per ingest POST. Events are gzipped as a JSON blob and stored verbatim. No row-per-event, no slice aggregates.
  5. Extract dozor:* markers — scan the batch's events for rrweb custom events (type=5) whose data.tag starts with dozor: and write a typed Marker row per match (kind="url" or kind="identity"). Stats queries hit Marker rather than decompressing event blobs.
  6. 204 No Content — empty response, the SDK proceeds to the next batch.

The whole pipeline is synchronous Postgres writes — no queue, no worker. Simplicity over throughput; if ingest volume grows past single-request capacity, the canonical next step is a Vercel-side queue (Inngest, Trigger.dev) plus a worker that batches rows.

What happens when you watch a replay

Open the dashboard, click a session row:

  1. GET /api/sessions/{id} — metadata + the marker list. Event batches are not in this response.
  2. GET /api/sessions/{id}/events — returns { batches: [{ data: "<base64-gzip>", … }] } ordered by firstTimestamp. The browser decompresses each blob via DecompressionStream, concatenates, sorts by timestamp.
  3. History builder runs in the browser — pure function over the events array that produces a chronological feed of timeline annotations (session start, navigations, idle gaps, identifies). Derived state; the feed rebuilds instantly when the source events change.
  4. rrweb's player consumes the full session event stream inside a sandboxed iframe — DOM mutations are re-applied to a fresh document, mouse moves and scrolls are replayed, the visual state of the user's browser at any point in time is reconstructed pixel-for-pixel. The history feed sits beside the player and click-seeks the replayer to any moment.

The sandbox is load-bearing for security: the recorded page's scripts can't execute in the dashboard's chrome. See Resources → Replay player.

Identity layer

Calling dozor.identify(userId, traits) in the SDK:

  1. The next ingest batch carries userIdentity: { userId, traits } in its metadata.
  2. The dashboard upserts a TrackedUser keyed on (projectId, userId) and links the session to it.
  3. The user shows up in the Users tab; all their sessions roll up under one row.

traits is free-form — anything you'd want to filter or display later (email, plan, signup date, A/B variant). Stored as JSON in Prisma, indexed only by (projectId, userId).

Two reads worth knowing

  • The SDK is one of two npm packages@kharko/dozor (vanilla) and @kharko/dozor-react (React Provider + Hook on top of vanilla). Both at kolia-zamnius/kharko-dozor-packages.
  • The dashboard is what you self-host — clone kharko-dozor-dashboard, deploy to Vercel + Neon (or any Postgres), point the SDK at your domain. Every byte of session data lives on your database.

See also

On this page