Email Validation Webhook Patterns for Async Flows

hangrydev ·

Synchronous Validation Falls Apart at Scale

You’ve got 10,000 leads in a CSV. Your sales team needs them verified before Monday’s campaign. So you loop through the list, call the validation API once per row, and wait.

Three hours later, the process crashes on row 7,412. No checkpoint. No recovery. Start over.

Synchronous email validation works fine for one address at a time. A signup form, a checkout field, a single-record API call. But the moment you need to validate a batch (50, 500, 50,000 addresses), the model breaks. Threads block. Connections time out. SMTP handshakes take 500ms to 3 seconds each, and greylisted servers force retries with 5-15 minute delays. You can’t hold an HTTP connection open for that.

Webhooks invert the flow. You submit the work, your server moves on, and results arrive asynchronously when they’re ready. No polling. No open connections. No blocked user staring at a spinner.

This guide walks through the architecture, from a basic receiver to production-grade patterns with HMAC signatures, idempotency, retry logic, and dead letter queues. Node.js and Rails code throughout.

Why Webhooks for Email Validation

Three concrete reasons to decouple validation from the request/response cycle.

Verification times are wildly unpredictable. Syntax checks take under 5ms. MX lookups run 50-200ms. Full SMTP verification? 500ms to 3 seconds per address on a good day. That’s the happy path. Greylisted servers force 5-15 minute retry delays. The email validation API guide breaks down where those latency costs come from.

Batch work doesn’t belong in request threads. A CRM export with 50,000 rows means 50,000 SMTP handshakes. Even at 100 concurrent validations, that’s 8+ minutes of wall time. Webhooks let you submit the batch, respond to the user immediately with a job ID, and process results as they stream back.

Failures need graceful recovery. A synchronous loop dies silently. A webhook-based system retries, logs failures, and gives you a dead letter queue for the stubborn ones. When row 7,412 fails, the other 49,999 still get processed.

The Architecture

Four steps. Three actors: your app, the validation API, and your webhook endpoint.

  1. Your app submits a batch of emails to the validation API with a webhook_url.
  2. The API queues the work and returns a batch_id immediately.
  3. As results complete, the API POSTs them to your webhook endpoint.
  4. Your endpoint verifies the signature, persists results, and returns 200 OK.

No long-lived connections. No polling intervals to tune. No WebSocket infrastructure to maintain.

For single-address async validation (a signup flow where you don’t want to block the response), the pattern is identical. Submit one email instead of a list, get one callback instead of many. The real-time vs async patterns post covers when to use each approach.

Building the Webhook Receiver

Your receiver has three jobs: verify the request is authentic, process the payload, and respond fast. Under 500ms. Most validation APIs treat anything slower than 10 seconds as a failure and retry. If your handler takes 12 seconds because it’s writing 500 rows to Postgres inline, you’ll get the same payload again while the first one is still running. Race conditions follow.

Node.js (Express)

// routes/webhooks/validation.js
const express = require("express");
const crypto = require("crypto");
const router = express.Router();

// Raw body required for signature verification.
// If you use express.json() globally, the raw bytes are gone
// and HMAC computation breaks.
router.post(
  "/webhooks/validation",
  express.raw({ type: "application/json" }),
  async (req, res) => {
    const signature = req.headers["x-truemail-signature"];
    const timestamp = req.headers["x-truemail-timestamp"];

    if (!verifySignature(req.body, signature, timestamp)) {
      return res.status(401).json({ error: "Invalid signature" });
    }

    const payload = JSON.parse(req.body);

    // Idempotency: skip if already processed
    const seen = await redis.get(`webhook:${payload.event_id}`);
    if (seen) {
      return res.status(200).json({ status: "already_processed" });
    }

    // Claim this event with a 24-hour TTL
    await redis.set(
      `webhook:${payload.event_id}`,
      "processing",
      "EX",
      86400
    );

    // Enqueue for background processing. Respond NOW.
    await queue.add("process-validation-result", payload);

    res.status(200).json({ received: true });
  }
);

function verifySignature(rawBody, signature, timestamp) {
  if (!signature || !timestamp) return false;

  // Reject timestamps older than 5 minutes (replay protection)
  const age = Math.abs(Date.now() / 1000 - parseInt(timestamp, 10));
  if (age > 300) return false;

  const expected = crypto
    .createHmac("sha256", process.env.TRUEMAIL_WEBHOOK_SECRET)
    .update(`${timestamp}.${rawBody}`)
    .digest("hex");

  const sigBuf = Buffer.from(signature);
  const expBuf = Buffer.from(expected);

  // timingSafeEqual throws if lengths differ
  if (sigBuf.length !== expBuf.length) return false;

  return crypto.timingSafeEqual(sigBuf, expBuf);
}

module.exports = router;

Three things to notice. First, express.raw() on the route, not express.json(). You need the raw bytes for HMAC computation. Parse JSON after verification. Second, crypto.timingSafeEqual for the comparison. Regular string equality (===) leaks timing information that lets attackers brute-force signatures byte by byte. Third, the length check before timingSafeEqual. Node throws ERR_CRYPTO_TIMING_SAFE_EQUAL_LENGTH if the buffers differ in byte length. A malformed signature would crash your handler without that guard.

Rails Controller

# app/controllers/webhooks/validations_controller.rb
module Webhooks
  class ValidationsController < ApplicationController
    skip_before_action :verify_authenticity_token

    def create
      unless valid_signature?
        head :unauthorized
        return
      end

      event_id = parsed_payload["event_id"]

      # Idempotency: skip duplicate deliveries
      if WebhookEvent.exists?(event_id: event_id)
        head :ok
        return
      end

      WebhookEvent.create!(event_id: event_id, status: "received")

      # Enqueue and respond. Don't process inline.
      ProcessValidationResultJob.perform_later(parsed_payload)

      head :ok
    end

    private

    def parsed_payload
      @parsed_payload ||= JSON.parse(request.raw_post)
    end

    def valid_signature?
      signature = request.headers["X-Truemail-Signature"]
      timestamp = request.headers["X-Truemail-Timestamp"]
      return false if signature.blank? || timestamp.blank?

      age = (Time.current.to_i - timestamp.to_i).abs
      return false if age > 300

      expected = OpenSSL::HMAC.hexdigest(
        "SHA256",
        ENV.fetch("TRUEMAIL_WEBHOOK_SECRET"),
        "#{timestamp}.#{request.raw_post}"
      )

      ActiveSupport::SecurityUtils.secure_compare(expected, signature)
    end
  end
end

Same pattern. Verify, claim the event, enqueue, respond. The skip_before_action :verify_authenticity_token is required because webhooks don’t carry CSRF tokens. Don’t remove it and wonder why every callback returns 422.

HMAC Signature Verification

Never trust an incoming POST just because it hits your webhook URL. Anyone who discovers the URL can send fake payloads. Signature verification proves the request came from the validation API and wasn’t tampered with.

The standard approach: HMAC-SHA256. The API signs the request body concatenated with a timestamp using a shared secret. Your receiver computes the same HMAC and compares.

Why include the timestamp? Without it, an attacker who intercepts a valid signed request can replay it indefinitely. The timestamp creates a window (5 minutes is standard). Anything older gets rejected.

Three things that break signature verification in production:

  1. Middleware parsing the body before you read it. Express’s express.json() deserializes the body. The raw bytes are gone. Your HMAC won’t match. This is the number one cause of “my webhook verification always fails” support tickets.
  2. Proxy headers changing the body. Load balancers or API gateways that re-encode the request body invalidate the signature.
  3. Character encoding mismatches. UTF-8 everywhere. If your framework re-encodes the body as Latin-1, the HMAC diverges silently.

Idempotency: Why Duplicates Are Inevitable

Webhooks get retried. The API sends a callback, your server returns 200, but a network blip means the API never sees your response. So it sends the same payload again. And again.

How often does this happen? Industry data suggests that webhook delivery failures are common enough to plan for. Stripe retries failed deliveries with exponential backoff over three days. Shopify retries up to eight times in a four-hour window before removing the subscription entirely. A 2025 webhook reliability report found that nearly 20% of deliveries fail silently during peak loads. Your handler will see duplicates. Count on it.

The fix: store processed event IDs before doing anything else. Check on entry, skip if seen. Redis with a TTL works for high-throughput systems. A database table works for everything else.

// workers/process-validation-result.js
async function processValidationResult(payload) {
  const { batch_id, results } = payload;

  for (const result of results) {
    // Upsert: update if exists, create if not.
    // Naturally idempotent at the row level.
    await db.contact.upsert({
      where: { email: result.email },
      update: {
        emailStatus: result.status,
        disposable: result.disposable,
        catchAll: result.catch_all,
        validatedAt: new Date(),
      },
      create: {
        email: result.email,
        emailStatus: result.status,
        disposable: result.disposable,
        catchAll: result.catch_all,
        validatedAt: new Date(),
      },
    });
  }

  await db.validationBatch.update({
    where: { id: batch_id },
    data: { processedAt: new Date() },
  });
}

Use upsert, not create. If the same email result arrives twice, the second write overwrites with identical data. No duplicated rows. No constraint violations. Naturally idempotent.

The Webhook Payload

A good callback payload gives you everything you need to act on results without follow-up API calls.

{
  "event_id": "evt_8f3a2b1c",
  "event_type": "batch.completed",
  "batch_id": "batch_4x9k2m",
  "timestamp": "2026-04-08T14:23:00Z",
  "results": [
    {
      "email": "[email protected]",
      "status": "deliverable",
      "mx_found": true,
      "disposable": false,
      "catch_all": false,
      "role_account": false,
      "smtp_provider": "google"
    },
    {
      "email": "[email protected]",
      "status": "undeliverable",
      "mx_found": false,
      "disposable": false,
      "catch_all": false,
      "role_account": false,
      "smtp_provider": null
    }
  ]
}

The event_id is your idempotency key. The batch_id ties results back to the original submission. The event_type tells you whether you’re looking at a partial progress update or the final result. Different event types, different handlers.

What about catch-all domains? Those show up as status: "risky" with catch_all: true. You can’t definitively verify individual mailboxes on these domains. The MX vs SMTP validation post explains why and what signals help disambiguate.

Retry Logic and Failure Handling

Webhooks fail. Your server goes down for a deploy. A database migration locks a table. The network hiccups between data centers. You need three layers of defense.

Layer 1: Respond Fast, Process Later

Return 200 OK within 500ms. Every time. Don’t validate business logic before responding. Don’t query your database to check if the batch exists. Verify the signature, enqueue the payload, respond.

If your handler takes 12 seconds writing 500 rows to Postgres inline, the API retries while the first request is still running. Now you’ve got two workers processing the same payload concurrently. Race conditions everywhere.

Layer 2: Exponential Backoff on Your Workers

The API retries on its side. Your background workers need their own retry strategy for processing failures.

# app/jobs/process_validation_result_job.rb
class ProcessValidationResultJob < ApplicationJob
  retry_on StandardError, wait: :polynomially_longer, attempts: 5

  discard_on ActiveRecord::RecordNotFound

  def perform(payload)
    batch = ValidationBatch.find(payload["batch_id"])

    payload["results"].each do |result|
      Contact.upsert(
        {
          email: result["email"],
          email_status: result["status"],
          disposable: result["disposable"],
          catch_all: result["catch_all"],
          validated_at: Time.current
        },
        unique_by: :email
      )
    end

    batch.update!(
      processed_count: batch.processed_count + payload["results"].size,
      completed_at: batch.fully_processed? ? Time.current : nil
    )
  end
end

retry_on with polynomially_longer gives you automatic backoff: 3s, 18s, 83s, 258s across attempts. discard_on ActiveRecord::RecordNotFound drops jobs for deleted batches. No infinite retry loops on data that doesn’t exist anymore.

Layer 3: Dead Letter Queue

After all retries are exhausted, the payload goes somewhere you can inspect it. Don’t let it vanish into the void.

// workers/dead-letter.js
async function handleDeadLetter(payload, error) {
  await db.deadLetterEvent.create({
    data: {
      eventId: payload.event_id,
      eventType: payload.event_type,
      payload: JSON.stringify(payload),
      errorMessage: error.message,
      errorStack: error.stack,
      failedAt: new Date(),
      reprocessed: false,
    },
  });

  // Alert the team
  await slack.notify(
    `Dead letter: ${payload.event_type} for batch ${payload.batch_id}`
  );
}

Review dead letters weekly. Most are transient failures that resolved on retry. The ones that didn’t point to bugs in your processing logic.

Queue-Based Architecture

The pattern above works for moderate volume. But what happens when you’re processing 500,000 validations a day and callbacks arrive in bursts of thousands per minute?

You need a proper queue between your webhook receiver and your processing logic. The receiver’s only job is to accept, acknowledge, and enqueue. Everything else happens downstream.

// Receiver: accepts webhook, pushes to Redis queue
router.post(
  "/webhooks/validation",
  express.raw({ type: "application/json" }),
  async (req, res) => {
    if (!verifySignature(req.body, req.headers)) {
      return res.status(401).end();
    }

    // Push raw payload to a Redis stream
    await redis.xadd(
      "validation-results",
      "*",
      "payload",
      req.body.toString()
    );

    res.status(200).json({ received: true });
  }
);
// Consumer: reads from Redis stream via consumer group
// Create the group once: XGROUP CREATE validation-results workers $ MKSTREAM
async function consumeResults(consumerId) {
  while (true) {
    const entries = await redis.xreadgroup(
      "GROUP",
      "workers",
      consumerId,
      "BLOCK",
      5000,
      "COUNT",
      100,
      "STREAMS",
      "validation-results",
      ">"
    );

    if (!entries) continue;

    for (const [, messages] of entries) {
      for (const [id, fields] of messages) {
        const payload = JSON.parse(fields[1]);
        await processValidationResult(payload);
        await redis.xack("validation-results", "workers", id);
      }
    }
  }
}

Redis Streams give you consumer groups, acknowledgment, and automatic redelivery of unacked messages. If a consumer crashes mid-processing, the message gets reassigned to another consumer after the visibility timeout. Sidekiq, BullMQ, and SQS all provide similar guarantees with different trade-offs.

The key metric: your webhook receiver should handle 10,000+ requests per minute without breaking a sweat. All it does is verify and enqueue. The heavy lifting happens in the consumer, which you can scale horizontally by adding more workers.

Securing the Endpoint

Beyond signature verification, four more things to lock down.

IP allowlisting. If the validation API publishes outbound IP ranges, restrict your webhook endpoint to those IPs at the load balancer level. Defense in depth.

HTTPS only. Webhook payloads contain email addresses. That’s PII in transit. TLS is non-negotiable.

Rate limiting. Even authenticated webhook endpoints need rate limits. If something goes wrong on the sender’s side and they blast you with 10,000 retries in a minute, your handler shouldn’t buckle. A reasonable cap: 1,000 requests per minute per source IP.

Dedicated endpoint. Run webhook receivers on a separate service or worker process. A flood of callbacks shouldn’t compete with your user-facing request pool for connections and memory.

Testing Webhooks Locally

You can’t receive webhook callbacks on localhost. Use a tunnel.

ngrok http 3000
# Use the generated URL as your webhook_url:
# https://a1b2c3.ngrok.io/webhooks/validation

For automated tests, record a real callback and replay it against your endpoint. Save the headers and body, then assert your handler processes it correctly. Test with invalid signatures, expired timestamps, and duplicate event IDs. These are the edge cases that break production.

The Checklist

Before you ship webhook-based validation, verify these six things.

Signature verification rejects tampered payloads and expired timestamps. Idempotency handles duplicate deliveries without side effects. Your endpoint responds in under 500ms. Background workers retry failed processing with backoff. Dead letter storage captures exhausted retries for manual review. Your webhook URL uses HTTPS and isn’t guessable.

That’s the async validation pattern end to end. Submit batches, receive callbacks, process results in the background, handle failures gracefully. Your users don’t wait. Your threads don’t block. Your data gets validated.