Wide events, and why we use them

May 14, 2026·8 min read

Ryan Cartwright

A user pings support at 3:42pm

A deploy fails, and a few minutes later the chat message lands: "deploy's broken, just says 'Something went wrong', screenshot attached." The screenshot is a red banner that says, correctly, that something went wrong. We have a screenshot, a rough timestamp, and zero information about what.

help-deploys

Today · 3:42 PM

alex p3:42 PM

deploy's broken, just says Something went wrong, screenshot attached.

Under our old setup, answering the question meant opening two tabs and reading a lot.

The first tab was Vercel's log viewer, where we could filter to a window around the user's timestamp and scroll for the route hit. The user ID was on one line, the org on another, the environment ID buried in the JSON of a third. The same ID would be env_1 in one place and environmentId: env_1 in another. None of it ever showed up in a single record.

The route would end with Workflow triggered: wfr_xyz. That meant leaving Vercel for the Hatchet dashboard, finding the workflow run, clicking into the failed task, and reading a mix of structured and unstructured logs to figure out which step had actually blown up.

Try the same investigation yourself:

$grep

236 of 236 lines in vercel

We'd reconstruct what happened, but the data was fragmented. The complete picture only existed in our heads, after we'd manually reassembled the pieces.

And that's the easy case. In production the log streams are full of interleaved sessions, so any investigation that starts with "something broke yesterday" instead of "3:42pm" was much worse: a day's worth of lines to search and scroll, hoping to spot the right one.

Why per-line logging stops working

Per-line logging worked for us for a long time. It's cheap to add, easy to read when one service's logs fit on a screen, and the natural response to a hard investigation is to add more of it. The trouble is each line is one engineer, at one moment, verifying one thing: console.log("created project", projectId). Across many such lines from different authors, with no shared schema, the picture you build stays fragmentary.

A single user action fans out through a route handler, a workflow trigger, a queue, a worker, and a sequence of tasks. Each emits its own log lines. A deploy is one thing in our heads, thirty fragments in the logs. logger.info("processing user " + id) is fast to write and fine to tail, but useless for asking "show me every deploy that failed for this org this week," because that question wants fields, not strings.

What we wanted instead

One unified view per event that happened, with all the context attached:

per-line logs

one wide event

{

"service": "web",

"user.id": "user_456",

"organization.id": "org_789",

"auth.method": "session",

"auth.outcome": "verified",

"trace.id": "0af7…"

}

An event being a boundary action: a user clicking apply on a deploy, a webhook arriving, a cron firing, an automation kicking off. It fans out into sub-events, the units of work that carry it: an HTTP request, a server action, a workflow task. We emit one wide event per sub-event, with fields accumulating throughout its lifetime: nested objects grouped by domain (user, organization, environment, stripe), counters, durations, outcomes, error details.

The query is now a filter-and-group in place of a grep. You can run one query against one table: every failed deploy this week, by organization. Anything we want to filter or group by goes directly on the event, including fields that are unique per request like requestId or user.id. Nothing is rolled up until query time.

The pattern isn't ours, Boris Tane's loggingsucks.com is the strongest recent argument for wide events. Honeycomb and Charity Majors have also been making the case for one-event-per-unit-of-work for years. What's worth talking about is what we learned retrofitting the idea into a TypeScript codebase that grew up on console.log.

Boris's example focuses on the HTTP case, one middleware and one handler. Ours needed three adjustments to that shape:

More than one unit-of-work type, so a single HTTP middleware isn't enough.
Utilities running four or five layers deep below any request context, so the event has to be reachable without being threaded through every signature.
Deploys that cross from the web app into Hatchet's worker pool, so a single trace has to stitch across services.

What we built

A small internal module on top of pino: three wrappers and a context API.

The wrappers cover each unit of work:

withRequestLogging for Next.js API routes
withActionLogging for Next.js server actions
withTaskLogging for Hatchet workflow tasks

Each one creates a WideEvent, runs the handler with the event in scope, and emits the event once when the handler returns or throws. Here's a simplified example lifted from one of our deploy routes:

export const POST = withRequestLogging(async (request, { event }) => {
  const session = await getSession();
  event.set({
    resource: {
      user: { id: session.userId },
      organization: { id: session.orgId },
    },
  });

  const input = await request.json();
  const env = await getEnvironment({
    id: input.environmentId,
    organizationId: session.orgId,
  });
  await assertCanDeploy(session, env);
  event.set({
    resource: {
      environment: { id: env.id, name: env.name, projectId: env.projectId },
    },
  });

  const result = await deployEnvironment(env, input);
  event.set({
    resource: { deployment: { id: result.id, action: "triggered" } },
  });

  return Response.json(result);
});

The wrapper stamps http.requestId, http.method, and http.path before the handler runs. The handler then accumulates domain context as it steps through the code. On a successful return the wrapper adds http.statusCode and duration_ms. On a thrown exception it pulls error.type, error.message, error.code, error.statusCode (plus the cause chain if present) onto the event and escalates the severity.

Services below the handler still log their own expected errors with whatever context is useful at that layer. The wrapper exists for the unhandled case: anything that bubbles all the way up to the boundary lands on the event before it emits, so we never lose a failed request to a missing catch.

What lands in our log aggregator is a single JSON record:

{
  "http.requestId": "req_8bf7ec2d",
  "http.method": "POST",
  "http.path": "/api/projects/p_1/environments/env_1/deploy",
  "http.statusCode": 200,
  "resource.user.id": "user_456",
  "resource.organization.id": "org_789",
  "resource.environment.id": "env_1",
  "resource.environment.name": "production",
  "resource.deployment.id": "dep_abc",
  "duration_ms": 142
}

There's a deliberate split between how the event is built and how it lands. The handler sets nested objects (event.set({ resource: { environment: { id, name } } })); the wrapper flattens them to dotted keys on the way out. We keep nesting in code because it groups domain context at authorship time, one event.set call per domain, with Suga resources collected under resource, infrastructure under infra, and so on. We emit flat because the tools downstream filter, group, and search dotted keys cleanly, while treating nested JSON as opaque.

Sentry is the specific motivator: resource.environment.name is a first-class filterable field in their UI, whereas an environment object containing { name } is not. Sentry's UI then re-nests dotted keys back into a tree under their common prefix when it renders them, so we keep the visual grouping for free.

The event in scope, not in signatures

The piece that made wide events actually work for us isn't the wrapper, it's how the event reaches code the wrapper never touches directly. In simple codebases, the event argument being passed by the wrapper is manageable. The trouble is the utilities four or five layers deep that talk to deploy targets and external services. They have context worth recording (which provider, how many resources touched, peak memory) but no access to the wrapper's event argument.

Threading an event parameter through every signature isn't a serious option. Any call site whose shape we don't control (framework hooks, library callbacks, ORM lifecycle methods) can't take the extra argument, and the layers that can would split into the ones that pass it and the ones that quietly drop it.

So instead we designed a solution using AsyncLocalStorage. It's built into Node, it propagates automatically across await boundaries, and its scope-per-call model maps cleanly onto one wide event per unit of work.

The wrapper opens a scope when the handler starts, and every async call inside that scope sees the same event. A free function setEventFields(...) finds the active event and writes to it, so a utility deep in the deploy path can stamp fields onto whichever request triggered it without ever taking an event reference:

// utility four layers deep, no event reference threaded in
async function ensureNamespace(k8s: KubeClient, env: Environment) {
  const result = await k8s.upsertNamespace(env.namespaceName);
  setEventFields({
    kube: { namespace: env.namespaceName, created: result.created },
  });
  return result;
}

Whether this runs from an HTTP request or a long-running job, the fields land on the wrapper's event automatically. The handler doesn't have to know it happened. We use event.set(...) when the wrapper has handed us the event and setEventFields from anywhere else. The result is that the wide event behaves like a property of the in-flight request rather than an argument we have to thread through. That's what let us roll the pattern out across an existing codebase.

The same ambient access powers a few smaller utilities on the event API.

event.increment("db.query.count"); // running counter, emitted as one field at the end
event.escalate("warn"); // bump severity from anywhere in the call stack

Severity can only ever go up, it can't be deescalated. A deeply-nested utility can mark something unusual without raising an error and without worrying about its severity being overridden by something above it.

Across the boundary

For cross-service work OpenTelemetry handles the stitching. When the web app schedules a workflow we inject a traceparent into additionalMetadata, the worker reads it back at task start, and the pino logger stamps the trace and span IDs onto every wide event. A single trace ID covers the whole chain: inbound request, server action, workflow run, every task inside it.

Why not OTel for logs too? The OTel logs API is built around emit-individual-log-records-correlated-to-a-trace, which is the older view of logs, metrics, and traces as separate signals. Wide events deliberately collapse all three into one rich record per unit of work, and building that on top of pino was less work than fighting an SDK shaped for a different problem.

The same incident, after

A user pings support: "deploy's broken, just says 'Something went wrong', screenshot attached." Same scenario as before, except this time the investigation is one query instead of two tabs and a lot of reading.

All we need to do now is open the log aggregator and filter on runtime.service = "worker" AND task.status = "failed" for the last hour.

Same incident

Same incident, but each event carries its full context inline. Filter by any field or value.

15 events

Each row has a trace ID that connects every wide event to the rest of the chain: the inbound request, the server action, the workflow run, and the error fields a task contributed via setEventFields from four layers deep. The fragments that used to live across Vercel and Hatchet are now one record, and we've reconstructed the failure in under a minute.

What it doesn't fix

Wide events are only as useful as the fields on them. An event with one UUID is no better than a console.log, and an event with every internal variable becomes its own kind of unreadable. Grouping fields by domain (organization, environment, stripe) instead of scattering flat IDs is part of the answer. We enforce it through code review; this could be better with tooling, but review catches deviations effectively for us right now.

Wide events also don't replace metrics for everything. Aggregates like p99 latency over a billion requests are cheaper against pre-aggregated data. We aren't at that scale, and at our size every dimension being queryable on every event is worth more than the storage cost. If that flips, we'll revisit.

If your system is one service and your logs fit on a screen, per-line logging is fine. The moment a unit of work crosses files, services, or async boundaries, the seams start showing. We crossed that line a while back, and built wide events when the time we'd save on investigations clearly outweighed the time to ship them.

If you're staring at a similar fragmented investigation and wondering whether to bother, we'd be happy to share more specifics about our implementation on Discord or GitHub.

Back to blog

Get started

Deploy something in the next 3 minutes.

Free tier, no credit card, just bring your repo.

Start free Talk to a human