Detecting Disposable Email Addresses: Why Blocklists Alone Aren't Enough

hangrydev ·

Your Blocklist Is Already Out of Date

You downloaded disposable_domains.txt from GitHub last week. It had about 4,000 entries. Solid coverage, right?

Not even close. Our system tracks over 180,000 active disposable email domains across 15,000+ providers. That 4,000-entry list covers roughly 2% of the ecosystem. The other 98% walked right past your signup form while you were merging pull requests.

The disposable email blocklist providers post covered why static lists exist and how to use them. This post covers why they aren’t enough, and what detection layers actually close the gap.

The Numbers Behind the Gap

The disposable-email-domains repo on GitHub is the most popular open-source blocklist. It’s community-maintained, well-vetted, and each domain goes through PR review. False positives stay low. But at ~4,000 entries, it captures a thin slice of what’s out there.

Aggregated lists do better on raw count. One automated GitHub repo pulls from 6+ sources and claims 110,000+ domains with daily updates via GitHub Actions. But automated scrapers sacrifice quality. They pull in domains without human verification, and false positives creep in. Block a customer’s legitimate domain because an automated scraper flagged it? That’s worse than letting a disposable through.

Meanwhile, disposable services keep spinning up new domains. Castle.io’s research found one attacker who registered 330 custom domains in under a month, all designed to look like small businesses. In a separate investigation, they identified 1,814 domains whose MX records pointed to a single disposable email provider, tinyhost.shop. By the time a domain gets reported, reviewed, merged, and deployed, the provider has moved on. IPQualityScore estimates that over 40% of newly registered domains are associated with fraud, phishing, or abuse infrastructure.

Static lists catch what’s already known. They don’t catch what showed up this morning.

How Disposable Services Evade Detection

The evasion playbook has gotten sophisticated. It’s not just “register a new domain.”

Subdomain generation is the first trick. Instead of registering new root domains, providers create infinite subdomains under a single parent: random123.tempservice.com, another456.tempservice.com. Each subdomain acts as its own disposable namespace. Your blocklist has tempservice.com but not every possible subdomain prefix. A naive exact-match lookup misses all of them.

Custom domain registration is the second. Services like Mail.tm expose full REST APIs with auth tokens. Abuse scripts can spin up and tear down disposable inboxes programmatically. Some operators register cheap .xyz or .click domains in bulk, point them at shared infrastructure, and burn through them in days.

Catch-all configuration is the third. A single domain configured as catch-all accepts mail for any local part. [email protected] works. [email protected] works too. The domain doesn’t need per-user mailbox provisioning. One DNS record, infinite addresses.

These three techniques together mean a disposable service can generate millions of unique, working email addresses without any of them appearing on a public blocklist. Sound familiar?

DNS Fingerprinting: Reading the Infrastructure

Here’s where detection gets interesting. Disposable providers can rotate domain names daily, but they rarely rotate their infrastructure. That infrastructure leaves fingerprints in DNS.

MX Record Clustering

A provider running 200 domains typically points them all at the same 2 to 3 mail servers. Those MX records are the fingerprint. When throwaway-alpha.com, burner-beta.net, and disposable-gamma.xyz all resolve to mx1.temp-backend-infra.com, you don’t need all three on your list. You need the MX pattern.

require "resolv"

KNOWN_DISPOSABLE_MX = Set.new([
  "mx.temp-backend-infra.com",
  "mail.throwaway-hosting.net",
  # Hundreds more MX hostnames from known providers
])

def disposable_by_mx?(domain)
  resolver = Resolv::DNS.new
  mx_records = resolver.getresources(domain, Resolv::DNS::Resource::IN::MX)
  mx_hosts = mx_records.map { |r| r.exchange.to_s.downcase }
  mx_hosts.any? { |host| KNOWN_DISPOSABLE_MX.any? { |known| host.end_with?(known) } }
rescue Resolv::ResolvError
  false
end

# A brand-new domain, registered 2 hours ago, not on any list
disposable_by_mx?("fresh-throwaway-2026.xyz") # => true (same MX as known provider)
import dns.resolver

KNOWN_DISPOSABLE_MX = {
    "mx.temp-backend-infra.com",
    "mail.throwaway-hosting.net",
    # Hundreds more
}

def is_disposable_by_mx(domain: str) -> bool:
    try:
        answers = dns.resolver.resolve(domain, "MX")
        mx_hosts = [str(r.exchange).rstrip(".").lower() for r in answers]
        return any(
            mx.endswith(known) for mx in mx_hosts for known in KNOWN_DISPOSABLE_MX
        )
    except dns.resolver.NXDOMAIN:
        return False
    except dns.resolver.NoAnswer:
        return False

This catches domains that aren’t on any public blocklist yet. The domain name is brand new. The infrastructure behind it isn’t.

IP-Level Clustering

Even if a provider uses different MX hostnames across domain batches, the underlying IPs often stay the same. Resolve the MX records to their A records, and you can cluster domains by shared IP ranges. Castle.io’s research highlights this: when providers register many domains but use the same mail server IP, you can identify them even if the domain names and DNS entries look completely different.

const dns = require("dns").promises;

async function getMxIps(domain) {
  try {
    const mxRecords = await dns.resolveMx(domain);
    const ips = [];
    for (const mx of mxRecords) {
      const addresses = await dns.resolve4(mx.exchange);
      ips.push(...addresses);
    }
    return ips;
  } catch {
    return [];
  }
}

// Two domains that look unrelated
const ipsA = await getMxIps("totally-legit-looking.com");
const ipsB = await getMxIps("another-innocent-domain.xyz");
// Both resolve to the same /24 subnet -> same provider

Missing Authentication Records

Legitimate businesses configure SPF, DKIM, and DMARC. Disposable services usually don’t bother. A domain with no SPF record, no DKIM, and no DMARC, combined with an MX pointing to a known VPS provider, is a strong negative signal. It won’t catch every disposable (some do set up basic SPF), but stacking it with other signals adds confidence.

Domain Age: The 30-Day Rule

Most disposable domains are young. Very young.

A domain registered 48 hours ago with no website, no SPF record, and MX records pointing to known disposable infrastructure? That’s not a real company. WHOIS (or RDAP, its modern replacement) gives you the creation date.

Domains under 30 days old that lack email authentication records carry significantly higher risk. This isn’t speculation. Research from IPQualityScore shows that over 40% of newly registered domains are associated with fraudulent activity. When you’re looking at a domain that’s 3 days old and accepting email for [email protected], the probability that it’s disposable is high.

import whois
from datetime import datetime, timedelta

def domain_age_days(domain: str) -> int:
    try:
        w = whois.whois(domain)
        created = w.creation_date
        if isinstance(created, list):
            created = created[0]
        if created:
            return (datetime.now() - created).days
    except Exception:
        pass
    return -1  # Unknown

def is_suspicious_age(domain: str, threshold_days: int = 30) -> bool:
    age = domain_age_days(domain)
    return 0 <= age < threshold_days

is_suspicious_age("registered-yesterday.xyz")  # True
is_suspicious_age("google.com")                 # False (registered in 1997)

Domain age alone isn’t conclusive. Plenty of legitimate startups have young domains. But stacked with MX fingerprinting and missing authentication records, it becomes a high-confidence signal.

Pattern Analysis: What the Local Part Tells You

The text before the @ carries signal too. Disposable services generate addresses in predictable patterns.

Random alphanumeric strings: [email protected]. No human picks that as their email. Sequential patterns: [email protected], [email protected]. Timestamp-based: [email protected]. These patterns differ from how real people create email addresses (firstname.lastname@, john42@, jsmith@).

Pattern matching won’t catch a determined abuser who manually types [email protected]. But it catches the 90%+ of disposable usage that’s automated. Bots generating thousands of signups per hour don’t craft realistic local parts. They don’t need to. Most sites don’t check.

Why Not Just Block Aggressively?

You could crank up detection sensitivity and block anything that looks slightly suspicious. Young domain? Blocked. Unusual MX? Blocked. Random local part? Blocked.

Here’s the problem: false positives kill conversions.

A developer using a personal domain they registered last month gets blocked. A privacy-conscious customer using SimpleLogin or Firefox Relay gets blocked. A small business with a cheap hosting provider whose MX records happen to share infrastructure with a disposable service gets blocked.

Every false positive is a real person who wanted to sign up and couldn’t. They don’t file a bug report. They leave.

The right approach isn’t maximum detection. It’s layered scoring with thresholds you can tune. A domain that fails one signal gets a yellow flag. A domain that fails three signals gets blocked. You decide where the line sits based on your use case.

A Shopify checkout blocking disposable emails needs different sensitivity than a free-tier SaaS signup. The checkout is protecting a transaction. The signup is protecting server resources. Different risk, different threshold.

Stacking Signals: The Multi-Layer Approach

No single signal is reliable enough on its own. Every technique has blind spots. The answer is combining them.

Layer 1: Known domain lookup. Check against a maintained database. This catches the obvious providers in under 5ms. Fast, cheap, high precision.

Layer 2: MX fingerprinting. If the domain isn’t known, check its mail server infrastructure against known disposable MX patterns and IP clusters. Catches new domains from existing providers.

Layer 3: Domain characteristics. Registration age, authentication records, hosting patterns. Catches brand-new providers with brand-new infrastructure.

Layer 4: ML-based classification. Feed all signals into a model trained on millions of labeled examples. The model learns patterns that hand-written rules miss: subtle MX naming conventions, specific registrar preferences, TLD distributions that correlate with disposable usage.

Each layer catches what the previous one missed. The first layer handles 60-70% of disposable traffic. The second catches another 20%. The third and fourth close most of the remaining gap.

MailCop runs all four layers on every validation API call. The disposable field in the response reflects the combined output, not just a list lookup. When a provider registers 50 new domains tomorrow morning, the infrastructure fingerprint catches them before any blocklist does.

The Trade-Off You Can’t Avoid

Every detection system sits on a spectrum between two failure modes: letting disposables through (false negatives) or blocking real users (false positives).

Pure blocklist: low false positives, high false negatives. You’ll never block a real user by accident, but 30%+ of disposable domains sail through.

Aggressive multi-signal: low false negatives, higher false positive risk. You’ll catch nearly every disposable, but some legitimate edge cases get flagged.

The sweet spot depends on what a false positive costs you versus what a false negative costs you. For a SaaS with a free tier that’s getting hammered by trial abuse, aggressive detection makes sense. For an e-commerce store, blocking a paying customer because their domain looked “slightly suspicious” is a bad trade.

Build the detection stack. Expose the score. Let the business logic decide the threshold. That’s how you avoid painting yourself into a corner.

What’s Actually Working in 2026

The cat-and-mouse game between disposable services and detection systems won’t end. Providers will keep registering domains. Detection systems will keep fingerprinting infrastructure. Both sides automate faster every year.

What’s changed: static lists are now the floor, not the ceiling. MX fingerprinting, domain age analysis, authentication record checks, and ML classification have moved from “nice to have” to table stakes. If your disposable detection is a text file you downloaded from GitHub, you’re catching the easy ones and missing the rest.

The teams that treat disposable detection as a living system (layered, scored, continuously updated) stay ahead. The rest play catch-up against opponents who register domains faster than humans review pull requests.

Which side of that gap is your signup form on?