Webhooks are deceptively simple. Event happens, POST JSON to a URL, done. Except the customer's server is down, responds slowly, returns 500, redirects to a login page, or receives the same webhook twice and creates duplicates. All of these have happened to us in production.
Our architecture has four components. The event publisher writes events to a PostgreSQL outbox table in the same transaction as the triggering change -- guaranteeing the event is recorded if the write succeeds. Never make the webhook call synchronously in the request path.
The delivery worker processes the outbox on a separate process. It reads undelivered events, POSTs to the configured URL, records the response, and marks delivered or failed.
Retry strategy: exponential backoff with jitter. Thirty seconds, two minutes, ten minutes, one hour, six hours, twenty-four hours. Six attempts over roughly thirty-one hours. If all fail, the event is permanently failed and the customer is emailed. Most failures are transient (server restarts, network blips) or long-term (misconfigured URL). This schedule handles both.
Idempotency is non-negotiable. Every payload includes a unique event ID. We document that customers must deduplicate on it. Payload signing prevents spoofing -- HMAC-SHA256 with a customer-specific secret. We include verification examples in four languages.
Timeout is aggressive: five seconds. A consumer should acknowledge quickly and process asynchronously. Accommodating slow consumers degrades reliability for everyone.
Monitoring tracks three metrics per customer: success rate, average response time, and current failure streak. When success drops below 90 percent or failures exceed ten consecutive, we notify the customer automatically. We also provide an event log in their dashboard showing every delivery attempt and response.
For high-volume customers (ten thousand plus events per day), we offer batching: accumulate events for thirty seconds or one hundred events, whichever comes first, and deliver as a JSON array.
Total implementation: three to four weeks. It is a real investment, but webhook reliability directly affects customer trust.
About the Author
Fordel Studios
AI-native app development for startups and growing teams. 14+ years of experience shipping production software.
We love talking shop. If this article resonated, let's connect.
Start a ConversationTell us about your project. We'll give you honest feedback on scope, timeline, and whether we're the right fit.
Start a Conversation