Replay real production failures—safely.
One timeline per event: capture → inspect → replay → diff. Built to sit next toDatadog & OpenTelemetry—not replace them.
Why ReplayStack vs logs-only tools
No credit card · Free tier
CloudWatch, Azure Monitor, and log analytics excel at volume & search. They rarely give you a replay button for the exact API call or job that broke.
- Request validated3ms
- Queue publish28ms
- SMTP send…
{
"to": "customer@acme.com",
"template": "receipt_v2",
"provider": "smtp_internal"
}Minutes to first replay
SDK or POST once — failures land as a timeline, not log soup.
Masking before storage
Passwords, tokens, and cookies are redacted by default—add maskFields only for extra key names.
Alerts with receipts
Every ping opens the same structured event your team trusts.
Your stack, not ours
Node, Python, Nest, Express, Next — or raw HTTP.
Use Datadog or OpenTelemetry to find where it broke. Use ReplayStack to reproduce and debug the exact failed event.
Replay layer, not another APM — keep Datadog/OTel for signals; use ReplayStack when you need the exact prod event replayed safely.
What ReplayStack does best
Five moves teams repeat when prod breaks — automated here.
Capture real failed events
Payloads, stacks, versions — everything needed to replay the failure.
Replay safely
Staging / safe mode — block payments, email, SMS, destructive side effects.
Compare results
Side-by-side original vs replay — see exactly what changed.
Generate bug reports
Export repro-ready reports: payload, trace, replay outcome, diff.
Works with your stack
Node, Python, Nest, Express, Next — or plain HTTP ingest.
Cloud logs
Great at storage & search
APM / traces
Maps latency & dependencies
ReplayStack
Replays the exact failing event
| Capability | CloudWatch / Azure Monitor logs | Datadog, ELK, Grafana Loki… | ReplayStack |
|---|---|---|---|
| What it optimizes for | Durable storage, search, and infra-scale ingestion | Dashboards, metrics, traces, log analytics (with agent work) | Fast understanding & safe reproduction of one backend event |
| When a webhook or job fails | You grep, correlate IDs, and rebuild requests manually | You pivot across tools; replay is usually custom glue code | You open the timeline, inspect payload & steps, replay immediately |
| Change a field and retry | Write a script or Postman collection; hope it matches prod | Often notebooks, curl, or internal tooling—outside the log UI | Edit JSON in-place, replay, compare responses side by side |
| Story of what happened | Log lines + your mental model | Traces (great) when instrumentation spans every hop | Ordered steps per captured event—built for human debugging |
| Best companion for | Fleet health, compliance archives, infra signals | Full-stack observability budgets & on-call runbooks | Application failures where repro is the bottleneck |
Logs/traces show blast radius. ReplayStack adds safe reproduction of one event.
From firefight to fix — one flow
Capture → isolate → replay → verify — without juggling six tabs.
SDK or HTTP — webhooks, APIs, workers land as structured events.
One timeline per failure — not a wall of stack traces.
Edit payload, resend, diff responses — no one-off scripts.
Share the replay link — ship the fix with proof.
Teams where reproduction is the bottleneck
When the bug needs prod-only context — payload, headers, provider quirks, deploy version.
- Backend developers
- QA engineers
- SaaS and API-heavy teams
- Agencies maintaining multiple client backends
- Teams chasing difficult-to-reproduce bugs
- Payment, order, booking, and healthcare workflows
Especially valuable when the failure hinges on
- Exact payload or user state
- Headers and downstream responses
- Environment and deployment version
- Timing-sensitive or flaky integrations
Everything that turns a failure into a fix
Watch what each layer does — then skim the cards for depth.
Ingest → prove with replay → keep it safe — one loop.
01 · Capture & context
Bring production into focus
Ingest webhooks, APIs, workers — each event stays tied to the customer or job that triggered it.
SDK & HTTP ingestion
Instrument services with a thin SDK or POST structured events directly. Tag environment, version, and service so filters stay honest.
- Works alongside your existing log pipeline
- Designed for high-cardinality paths you actually debug
Step-by-step timelines
Each event renders as an ordered narrative: validation, IO, downstream calls, and timings—so “where it broke” is obvious on the first screen.
- Less grep, fewer missing correlation IDs
- Shareable view for support and engineering
Live health signal
Watch volume, error rates, and slow steps as events arrive. Spot regressions before they become all-hands incidents.
- Fits on-call workflows without extra dashboards
- Pairs with alerts when thresholds slip
02 · Replay & verify
Prove the fix before you merge
After the stack trace — prove the patch against the real payload, not a guessed curl.
Replay & response diff
Re-run the exact event—or a variation—against staging or production-safe endpoints. Compare status codes, bodies, and timings side by side.
- Stop maintaining one-off curl scripts
- Keep a paper trail of what you tried
Payload workspace
Edit JSON with guardrails: tweak headers, swap providers, or simulate edge cases before you send the next replay.
- Validate shape before the request leaves ReplayStack
- Great for flaky third-party integrations
Shareable timelines
Every event has a stable narrative—deep-link it in Linear, Jira, or Slack so support and engineering debate the same facts, not screenshots.
- Fewer “can you send the curl?” loops
- Onboarding with real production shapes, safely masked
03 · Trust & operations
Safe by default, loud when it counts
Masking, routing, and exports follow the event — auditors see policy, not tribal scripts.
Masking & policy
Built-in redaction for passwords, tokens, cookies, and card fields—plus optional maskFields for your own key names. Values become [MASKED] before storage.
- maskFields is optional; defaults always apply
- No clear-text secrets in timelines or replay exports
Routing & alerts
Notify the right channel when specific services, routes, or customers fail—email, Slack, or downstream webhooks.
- Noise-aware thresholds per project
- Escalate with the timeline link attached
Outbound webhooks
Fan failures out to the tools you already run—HTTP callbacks alongside email, Slack webhooks, or other routing when an alert fires.
- Wire alerts into tickets or chat without a custom poller
- HTTP APIs mirror what you configure in the UI
Want implementation detail? Browse the docs for SDK snippets, HTTP examples, and framework guides.
Pricing that rewards fixing—not hoarding logs
Start free — scale when replay volume grows (numbers update live below).
Frequently asked questions
Quick answers about ReplayStack pricing, replay safety, and getting started.
What is ReplayStack?
What problem does ReplayStack solve?
Who should use ReplayStack?
What types of events can ReplayStack capture?
Contact us for anything not listed here.
Built by developers, for developers
Built from late-night prod fires — we wanted backend debugging to feel as obvious as stepping through UI.
What we capture — and what we never store in clear text
ReplayStack is built for backend debugging. Sensitive keys are redacted before storage; you only send what your integration captures.
What ReplayStack stores
- Backend API, webhook, queue, and worker events you explicitly instrument with the SDK or ingest API.
- Request metadata you choose to send: route, method, status, latency, service name, environment, and correlation IDs.
- Request/response bodies and headers after automatic masking (values replaced with [MASKED], not stored in clear text for matched keys).
- Stack traces and structured steps so your team can replay and diff failures—not raw end-user browser sessions.
What we never collect (by design)
- End-user mouse clicks, screen recordings, or front-end DOM snapshots (ReplayStack is backend observability, not session replay for visitors).
- Passwords, bearer tokens, cookies, or API secrets in clear text—matching field names are redacted before storage.
- Full payment card numbers or CVV values when field names match our built-in list (add custom names with maskFields if your schema differs).
- Arbitrary files, databases, or infrastructure metrics unless your integration sends them as part of an event payload.
- Selling or renting your production payloads to advertisers or data brokers.
maskFields (SDK option)
Optional. Optional. Extra JSON/header field names to redact. A built-in sensitive-name list always runs first—you never disable it by leaving maskFields empty.
Example: maskFields: ['phone_number', 'national_id', 'patientId']— none required beyond the built-in list below.
Always redacted when the field name matches (built-in list)
- authorization
- password
- passwd
- token
- access_token
- refresh_token
- apiKey
- api_key
- secret
- client_secret
- cookie
- set-cookie
- cardNumber
- card_number
- cvv
- otp
- SDKs and ingest apply masking on the way in—before events land in timelines, exports, or replay sessions.
- Field matching is case-insensitive; hyphens, underscores, and spaces are treated as equivalent (e.g. api_key ≈ apiKey).
- Use maskFields only for extra names (phone_number, national_id, patientId). Defaults always apply even when maskFields is omitted.
Read more on our Security page, Privacy policy, and masking docs.
Ready to turn every failure into a replayable story?
Proof next to your logs — less guesswork on the same incidents.
No credit card · Free tier forever · Upgrade when replay volume grows