OpenTrustToken: Structural Overview

The independent trust attestation layer for AI agent commerce. How it works, how scoring is built, how the fetch escalation ladder reaches sites that block our datacenter IPs, and where the compositional brand anchor, KYC infrastructure, and longitudinal trust graph fit into the architecture.

The Problem

AI agents can pay. They have no way to know who to trust.

Stripe's Machine Payments Protocol, Coinbase x402, Visa's AP2 framework, and Skyfire KYAPay are building payment rails for AI agents. These handle "how agents pay." Nobody has standardized the pre-transaction question: "should this agent pay this specific site?"

OpenTrustToken answers that question. One API call returns a cryptographically signed evidence bundle with a trust score, a brand tier classification, a crawlability flag, and a PROCEED / CAUTION / DENY recommendation. The evidence bundle is the product. Payment protocols can cite our verdicts without putting us on their critical path, because the Ed25519 signature lets anyone verify a token without calling our API.

For agents

Call our API before any payment. Get signed evidence: domain age, SSL, DNS security, reputation, content quality, identity signals. Make an informed decision programmatically.

For site owners

Check your trust score and get a checklist of specific improvements. Add a privacy policy, enable HSTS, configure DMARC. Each fix raises your score. No code integration needed.

The real asset

The database of longitudinal trust profiles. Raw signal data stored separately from scores, enabling algorithm updates without re-crawling. Every check adds to the dataset.

Architecture

How a trust check works

Agent calls the API

GET /v1/check/merchant.com

Six signal collectors run in parallel

Domain age (WHOIS with registrant change detection), SSL/TLS (direct probe, OV/EV cert org extraction), DNS security (SPF, DMARC, DNSSEC, CAA), content analysis (multilingual legal pages in 12+ languages, security headers, structured data, payment processors, social media links, response time, site category detection), reputation (logarithmic Tranco ranking, Google Safe Browsing, Spamhaus DBL, SURBL, URLhaus), identity (GDPR-aware WHOIS analysis, cert org, Tranco identity, ccTLD verification bonuses, institutional TLD bonuses, schema.org).

Content fetch escalates through a four-tier ladder

Tier 1: direct httpx with realistic browser headers from the API box. Tier 2: real headless Chromium on a dedicated crawler droplet (playwright-stealth, pool of warm contexts). Tier 3: Playwright through Decodo rotating residential proxy (fresh residential IPs across Comcast, Verizon, T-Mobile, Charter). Tier 4 (rolling out): real Chrome on a physical Mac on residential ISP via Tailscale. Each tier has independent circuit breakers so a failure in one does not poison the others. Counters exposed at /stats.fetch per tier.

Scoring engine computes the result with two anchors

Weighted formula applied to all signals. If content is unreachable after the full tier ladder, content's 17% is dropped and the remaining five signals renormalize. If the domain meets the well-known brand anchor (top-50K Tranco + 5+ years + clean reputation + valid SSL), an identity floor of 50 and a trust_score floor of 75 are applied. Flags checked (malware/phishing override to DENY regardless of score). Recommendation generated. Actionable checklist built from failing items.

Result is signed and returned

Ed25519 signature applied to the canonical evidence bundle. Response includes raw signals, computed score, brandTier (well_known or scored), crawlability (ok or blocked), site category (consumer / infrastructure / api_service), jurisdiction context (country, legal framework, cross-border risk, dispute resolution), recommendation, reasoning text, actionable checklist, flags array, and cryptographic proof. Cached for 7 days.

Data storage architecture

Raw signal facts are stored separately from scored results. When the scoring algorithm changes, every domain can be re-scored from stored data without re-crawling. This is the key architectural decision that makes the dataset useful long-term.

-- Three-table architecture
domains -- domain, first_checked, last_checked, registered, check_count
raw_signals -- domain, checked_at, signal_data (JSON blob of all raw facts)
scored_results -- domain, score, recommendation, model_version, full_response

Scoring Model

How trust scores are computed

The score is a weighted average of six signal categories, with two anchors layered on top: content re-weighting when we cannot reach the homepage (content's 17% is dropped and the remaining five signals renormalize to sum to 100%), and the well-known brand anchor for unambiguously established sites. The model is versioned (current: ott-v1.3-weights) and can be updated without re-crawling because raw signals are stored separately from scored results.

Signal	Weight	Sources	What it answers
Reputation	30%	Tranco ranking (log curve), Google Safe Browsing, Spamhaus, SURBL, URLhaus	Is this site established and clean?
Identity	25%	GDPR-aware WHOIS analysis, OV/EV cert org, Tranco identity (top 5K), institutional TLD (.gov/.edu), ccTLD verification bonuses, schema.org	Do we know who runs this?
Content	17%	Privacy/terms/contact in 12+ languages, security.txt, robots.txt, security headers, payment processors, social media, structured data, response time. Category-aware: infrastructure sites scored on API docs and security posture instead of legal pages	Does this site operate professionally?
Domain Age	10%	WHOIS registration date, logarithmic age curve	How long has this existed?
SSL/TLS	10%	Certificate validity, TLS version, HSTS, issuer	Is the connection secure?
DNS	8%	SPF, DMARC, DNSSEC, CAA records	Is the domain properly secured?

Tranco logarithmic curve and identity buckets (v1.3)

The Tranco list ranks the top 1 million domains by traffic, aggregated from Cloudflare Radar, Cisco Umbrella, Majestic, and Quantcast. Instead of flat brackets, we use a logarithmic curve for reputation so rank #1 is meaningfully different from rank #1,000. Formula: score = 100 - 3 * log10(rank). Identity buckets are broader in v1.3 so sites like petco.com (rank 12,647) and crateandbarrel.com (rank 12,931) receive the identity credit their membership warrants.

Rank	Reputation Score	Identity Bonus	Example
1	100	+25	google.com
100	94	+25	apple.com, walmart.com
1,000	91	+20	kohls.com, macys.com
5,000	89	+15	chewy.com, nordstrom.com
10,000	88	+12	petsmart.com
50,000	86	+8	petco.com, crateandbarrel.com
100,000	85	+5	small but real
500,000	83	+3	edge of list
Not listed	70-80	none	depends on Safe Browsing

Well-known brand anchor (new in v1.3)

When a domain meets all four conditions simultaneously, we apply a compositional floor to scoring. The rationale is that long-term Tranco top-50K membership is unfakeable (the list comes from billions of real-user requests), and when combined with 5+ years of domain age, a clean reputation file, and a valid SSL certificate, the probability of the site being a bad actor is effectively zero. This mirrors how credit bureaus weight account longevity over transient activity.

Anchor conditions (all four required)

Tranco rank within the top 50,000
Domain registered for at least 5 years (1,825 days)
SSL/TLS certificate valid and chain verifies
Reputation clean: no malware, phishing, or spam blocklist hits

What the anchor does

Raises the identity signal floor to 50 before the weighted sum
Floors the final trust_score at 75 (PROCEED threshold)
Sets brandTier: "well_known" in the API response
Adds the WELL_KNOWN_BRAND flag so consumers can see the anchor was applied

What revokes the anchor (any one of these)

MALWARE_DETECTED (Google Safe Browsing, URLhaus)
PHISHING_DETECTED (Safe Browsing, Spamhaus DBL, SURBL)
SPAM_LISTED (Spamhaus, SURBL)
RECENTLY_COMPROMISED (monitoring alerts, historical baseline drift)

The anchor addresses a credibility problem the automated layer cannot solve on its own. When content fetch is blocked by a merchant's bot protection (Cloudflare Bot Management, Akamai, PerimeterX), identity signals that transitively depend on content (contact on site, schema.org markup, presence of an ott.json file) are all unavailable. Without the anchor, a crateandbarrel or petco would score CAUTION despite being unambiguously established brands. The anchor encodes the missing evidence explicitly rather than hand-curating a whitelist of "known good" domains.

Content re-weighting when the homepage is unreachable

Some major retailers block automated crawlers at the network level. When our fetch ladder exhausts all tiers and cannot retrieve the homepage, we do NOT score content as 0 (that would conflate "we couldn't look" with "the site has no content"). Instead, we drop content's 17% weight from the sum and renormalize the remaining five signals to 100%. The response carries crawlability: "blocked" and the CONTENT_UNSCORABLE flag so consumers know why the score was computed from fewer signals.

Recommendations

Score	Recommendation	Agent behavior
75-100	PROCEED	Transaction authorized
40-74	CAUTION	Apply limits or ask user for confirmation
0-39	DENY	Refuse the transaction

Critical flags (MALWARE_DETECTED, PHISHING_DETECTED) override to DENY regardless of score.

Example: how real sites score under v1.3

stripe.com brandTier: well_known 83 PROCEED

Reputation

Identity

Content

Domain Age

100

SSL/TLS

100

DNS

Tranco top-1K, OV certificate, TLS 1.3 + HSTS, full content signals. Identity capped at 55 (automated maximum); KYC tier would unlock higher. Reaches PROCEED through natural weighted sum; well-known anchor confirms but does not need to be applied.

crateandbarrel.com brandTier: well_known crawlability: blocked 76 PROCEED

Reputation

Identity

Content

n/a

Domain Age

100

SSL/TLS

DNS

Tranco rank 12,931, domain registered 1995, clean reputation, valid SSL. Cloudflare Enterprise Bot Management blocks our crawler at the fingerprint layer even through residential proxy rotation, so content is unreachable. Content weight drops to 0% and the remaining five signals renormalize. Identity floor (50) and score floor (75) applied because all four anchor conditions hold. Result: PROCEED 76 with full transparency about why content is marked n/a.

pets.com brandTier: scored crawlability: blocked 48 CAUTION

pets.com is a famous dot-com-era brand, now owned by petsmart's parent company. But the pets.com domain itself is a shell: apex refuses TCP, www times out, and its Tranco rank has decayed outside the top 50K. Our system correctly identifies that the domain, as a technical entity, is unreachable and unanchored. We score petsmart.com at PROCEED 79 and pets.com at CAUTION 48 in the same API, because scoring a brand and scoring a domain are different questions. This is the difference between heuristic brand recognition and honest domain verification.

The four trust layers

Scores stack through layers. Each layer adds trust signals that the previous layer cannot provide.

Layer	What provides it	Typical score range
1. Automated signals	Public data (SSL, DNS, content, domain age)	40-65 for average sites
2. Well-known brand anchor	Tranco top-50K + 5+ years age + clean reputation + valid SSL. Also: OV/EV cert, institutional TLD (.gov/.edu), full content signals	75-85 for top-tier sites, automatic PROCEED when anchored
3. Registration (free)	Structured data collection: biz name, EIN/VAT, address, phone, social. Each verified field earns points.	+6 to +30 points
4. Enhanced ($29/mo)	Business registry, address, phone verification	Ceiling: 80
5. KYC-Verified ($99/mo)	Government ID, business docs, bank, video call	Ceiling: 95
6. Enterprise ($499/mo)	Audit, continuous monitoring, transaction insurance	Ceiling: 100

Registration Model

Free registration as a data event

Registration is not just domain ownership proof. It is a structured data collection that feeds cross-referencing and verification. The more data a site provides, the more we can verify, the higher the score. Each verified field earns points independently.

Registration fields and scoring

Field	Required	How it scores	Points
Domain ownership (DNS/HTTP)	Yes	Proves control of the domain	+3
Business legal name	Yes	Cross-referenced against WHOIS org, SSL cert org, state registry	+5 if match
Contact email	Yes	Verified that email domain matches the site domain	+3 if match
Country / State	Yes	Used to look up business in correct jurisdiction	(enables registry check)
Business type	Yes	Contextualizes scoring (nonprofits, government scored differently)	(context)
Website category	Yes	Adjusts which checklist items are relevant	(context)
EIN / Tax ID	No	Verified against IRS or national tax registry	+5 if verified
Phone number	No	Checked against what appears on the site	+3 if match
Physical address	No	Verified as real commercial location via USPS/Google Places	+3 if verified
Social media profiles	No	Verified bidirectional link (profile links to site, site links to profile)	+3 if verified
Business registry match	Auto	Business name found in state/national corporate registry	+5 if found

Maximum registration boost: +30 points to identity signal. A registration with just domain proof and email gives +6. A complete registration with all fields verified gives +30. This creates a natural gradient: more disclosure = more trust.

Privacy framework

Registration data is split into public and private categories. The API exposes verification status, never the underlying data.

Public (shown in API responses)

Business name, country, state, business type, website category, year established. These are already public record in business registries. Verification statuses are public: "einVerified: true" tells agents the EIN was confirmed without revealing the number.

Private (never exposed)

Contact name, email, phone, physical address, EIN/tax ID. Used only for internal verification, dispute resolution (with legal authorization), and KYC upgrade pre-population. Encrypted at rest. Covered by privacy policy and data processing agreement.

The strategic value: Registration data is proprietary. Public signals (WHOIS, SSL, DNS) are available to anyone. But "this site's owner told us their business name is X, their EIN is Y, and we verified both against state records" is data only we hold. When a registered site upgrades to the $29 Enhanced tier, we already have half the KYC data collected. Registration is the top of the KYC funnel.

KYC Infrastructure

Paid verification tiers

Beyond free registration, paid tiers add human verification and continuous monitoring. These tiers are the primary monetization mechanism and are funded by the natural demand from sites that need higher scores to attract agent transactions.

Enterprise $499/mo Annual audit, continuous monitoring, transaction insurance (up to $50K/incident), dedicated compliance officer Ceiling: 100

KYC-Verified $99/mo Government ID, business docs, bank verification, video call, beneficial ownership Ceiling: 95

Enhanced $29/mo Business registry cross-reference, address/phone verification, social media presence Ceiling: 80

Registered Free Site owner proves domain ownership (DNS or HTTP verification). Score boost for active opt-in to the trust framework. up to +30 points

Automated Free Machine-only checks from public data. No site owner involvement needed. Default for all domains. Ceiling: ~88

Why sites will pay

A new merchant registers a domain and builds a site. Automated checks give them a score of 40 (CAUTION). AI agents will ask users for confirmation before paying, or apply transaction limits. Their competitors who verify identity score 80+ (PROCEED) and get frictionless agent transactions. The upgrade path is clear and the motivation is economic.

KYC operations stack

KYC is operationally intensive. The infrastructure uses established verification providers rather than building from scratch.

Function	Provider Options	Cost per verification
ID verification	Jumio, Onfido, Persona	$2-5
Business registry	OpenCorporates, Dun & Bradstreet	$0.10-1.00
Bank verification	Plaid, Stripe Identity	$1.50-3.00
Phone verification	Twilio Verify	$0.05
Address verification	SmartyStreets, Google Address	$0.01-0.05
Video call	Zoom API, Daily.co	$0.03/min

Total platform cost per full KYC: under $40. At $99/mo subscription, payback in first month.

Global Coverage

International scoring and jurisdiction

One global score, contextual risk

Every domain gets a single trust score regardless of region. The API also returns jurisdiction context: country, legal framework (US/EU-EEA/APAC/other), cross-border risk level, and dispute resolution availability. Agents apply their own regional policies on top of the score.

GDPR-aware WHOIS

European domains are legally required to redact WHOIS data under GDPR. Our scoring treats GDPR redaction as neutral, not a penalty. EU businesses are not disadvantaged for complying with their own privacy laws.

ccTLD verification bonuses

Country-code TLDs like .jp, .cn, .au, .br, .de, .uk require local entity verification by their registries. These domains receive an identity bonus because the registry already performed some verification.

Multilingual content detection

Privacy policies, terms of service, and contact pages are detected in 12+ languages including English, German, French, Spanish, Portuguese, Italian, Dutch, Swedish, Polish, Czech, Hungarian, and Turkish. International common paths (/datenschutz, /conditions-generales, /kontakt) are probed.

Category-aware scoring

Infrastructure and API domains are auto-detected and scored differently from consumer sites. Security headers, API documentation, and status pages matter more than privacy policies for infrastructure. This prevents CDN and cloud service domains from being unfairly penalized.

Infrastructure micropayments

Agent micropayments to infrastructure (via x402, Stripe MPP) are a major emerging market. CDN endpoints, API services, and compute providers need trust verification too. Our category-aware scoring ensures these domains are assessed on signals relevant to their function.

Security

Anti-gaming and ongoing monitoring

Unfakeable compositional signals

Four signals resist gaming individually: domain age (requires real time to pass), Tranco ranking (requires real global traffic measured across four independent networks), OV/EV certificates (require CA verification against business records), and blocklist cleanliness (signals monitored continuously by Google Safe Browsing, Spamhaus, SURBL, URLhaus). The well-known brand anchor treats the composite of all four as stronger than any signal in isolation: a domain that satisfies all four simultaneously has effectively zero probability of being a bad actor, which is how it unlocks automatic PROCEED for established sites we cannot crawl directly.

Historical WHOIS monitoring (planned)

Scammers buy aged domains from auctions to inherit credibility. True detection requires historical WHOIS snapshots so we can diff the registrant name and email across time. Integration with a historical WHOIS provider (DomainTools, SecurityTrails, WhoisXML) is planned. The simpler heuristic of "recent WHOIS update = ownership change" produces too many false positives on routine admin churn (annual renewals, DNSSEC changes, nameserver tweaks) and is currently disabled.

KYC continuous monitoring

KYC verification is not a one-time stamp. Paid tiers receive ongoing monitoring: content change detection, SSL cert changes, WHOIS registrant changes, and blocklist checks. Any anomaly triggers a flag or score freeze pending investigation.

Tier	Re-check frequency	Annual re-verification
Automated	Weekly signal refresh	None
Registered	Weekly	None
Enhanced	Every 72 hours	Business registry re-check
KYC-Verified	Every 48 hours	Document re-upload + brief review
Enterprise	Daily	Full re-audit

Trust Infrastructure

Cryptographic signing and verification

Every trust token is signed with Ed25519 so results cannot be forged. This is the same signature scheme used by SSH, Signal, and major blockchain protocols.

Signing process

Evidence bundle is canonicalized (JSON Canonical Scheme, RFC 8785), SHA-256 hashed, then signed with our Ed25519 private key. The key lives in a secure, restricted-access environment. In production, this moves to an HSM (Hardware Security Module).

Verification

Anyone can verify a token by fetching our public key from the DID document at /.well-known/did.json and checking the Ed25519 signature. No API call to us needed for verification. W3C Verifiable Credentials compatible.

Why this matters

When disputes arise ("this site scored 80 but defrauded me"), the signed token is timestamped, cryptographic proof of what the assessment was at the time of the transaction. This is critical for insurance underwriting and regulatory compliance.

Data Strategy

The database is the asset

The API and scoring are the interface. The accumulated dataset of trust profiles is what compounds in value and becomes impossible to replicate.

What we store per domain per check

Registration date, SSL cert chain, DNS configuration, content signals, reputation flags, payment processors detected, technology stack, Tranco ranking, social media presence, structured data, response time, redirect chains, security header configuration, cookie consent status.

Longitudinal value

A single check is a snapshot. Repeated checks over time create a trust trajectory. A site whose score dropped 20 points in a month is a different risk than one stable for a year. This historical data is what makes our scores more valuable than any point-in-time check an agent could run itself.

Future data customers

Insurance underwriters pricing agent transaction coverage. Payment rails assessing merchant risk. Agent frameworks making routing decisions. Security researchers tracking threat patterns. Compliance teams at financial institutions.

Market Position

Where OTT sits in the agent commerce stack

Player	What they do	Relationship to OTT
Stripe MPP	Agent payment protocol (stablecoin + fiat)	Complementary: OTT is the pre-transaction check
Coinbase x402	HTTP-native crypto payments	Complementary: OTT verifies before x402 pays
Skyfire KYAPay	Agent identity + payments (proprietary)	Competitor on identity; OTT is open and rail-agnostic
Visa Trusted Agent	Agent credential framework	Different focus: they verify agents, we verify merchants
Google Safe Browsing	Binary safe/unsafe flag	Data source: we consume their API as one of many signals

The neutrality argument: Payment rails cannot credibly be the trust authority for the merchants they process payments for. OTT is independent: we don't process payments, we don't take a transaction fee, we verify trust. Same structural reason SSL certificates come from independent CAs, not from browser vendors.

Roadmap

What's built, what's next

Phase	Status	What
Protocol spec	Complete	RFC-style spec v0.2, evidence-first token format, W3C VC compatible
API server	Live	FastAPI, 6 signal categories, Ed25519 signing, SQLite, cached results
Scoring v1.3 with brand anchor	Live	Well-known brand anchor, content re-weighting, expanded Tranco identity buckets, GDPR-aware, category-aware, compositional gating
Reputation pipeline	Live	Tranco + Safe Browsing + Spamhaus + SURBL + URLhaus
International scoring	Live	12+ languages, GDPR WHOIS awareness, ccTLD bonuses, jurisdiction detection
Category detection	Live	Auto-detect infrastructure vs consumer, category-adjusted scoring
Landing page + free checker	Live	Public domain checker at opentrusttoken.com
Domain registration flow	Live	Structured data collection, per-field verification scoring (max +30), DNS/HTTP domain proof, public/private separation, cross-referencing against WHOIS and SSL cert org
Site owner dashboard	Live	Score hero, signal breakdown bars, jurisdiction profile, grouped checklist, registration status with per-field verification, score history chart with PROCEED threshold line
Python + TypeScript SDKs	Built	sync/async clients, LangChain and CrewAI tool wrappers. Not yet published to PyPI / npm.
Fetch tier 1 (direct httpx)	Live	Realistic browser headers, Brotli decompression, Chrome and Safari user-agent rotation, www-fallback for sites with dead apex
Fetch tier 2 (headless Chromium)	Live	Dedicated crawler droplet (ott-crawler-1), FastAPI fetch service on private VPC, playwright-stealth, pool of warm contexts, shared-secret auth, circuit breaker
Fetch tier 3 (residential proxy)	Live	Decodo rotating residential IPs (gate.decodo.com:7000), Playwright through proxy, ephemeral contexts per request, separate circuit breaker, counters at /stats.fetch
Fetch tier 4 (real Chrome, residential)	In progress	Physical Mac at a residential ISP via Tailscale. Intended to crack Cloudflare Enterprise Bot Management tier where datacenter + stealth alone cannot reach.
Registry seeding	Growing	1,100+ domains across Tranco spectrum + international commerce + 5 major US retailer verticals
Calibration dataset	Next	Every signed bundle has a check_id. Outcome feedback from registered merchants and API consumers retroactively backtests scores against real fraud outcomes. Over 6-12 months this becomes the real moat.
KYC infrastructure	Planned	Enhanced, KYC-Verified, Enterprise tiers with partner integrations (Jumio / Onfido / Persona / OpenCorporates / Plaid)
Infrastructure scoring v2	Planned	Parent company linkage (cloudfront.net to Amazon), x402/MPP endpoint detection, API-specific checklists
Historical WHOIS integration	Planned	Enables true registrant-change detection without the false positives of the updated_date heuristic
Agent SDK reference integration	Priority next	One widely-used agent framework (LangChain, CrewAI, OpenAI Swarm, Anthropic tool-use) calls OTT before payments in its reference implementation. Single most impactful GTM move.
Regional KYC partners	Planned	Companies House (UK), VIES (EU VAT), regional ID verification providers
PostgreSQL migration	At scale	Move from SQLite when registry exceeds 10K domains
Multi-region	Future	Geographic distribution for sub-200ms global response times

Technical Concepts

Key technologies explained

For team members and stakeholders who want to understand the technical foundations.

Ed25519 Cryptographic Signing

Ed25519 is a digital signature algorithm designed by Daniel Bernstein. We use it to sign every trust token so that the result can be verified without calling our API. Think of it as a tamper-proof seal: if anyone changes even one character of the token after we sign it, the verification fails. The private key never leaves our server. The public key is published openly for anyone to verify against. This is the same technology that secures SSH connections and Signal messages.

W3C Verifiable Credentials

Our trust tokens follow the W3C Verifiable Credentials standard, the same format governments and banks are adopting for digital identity. This means any system that already understands VCs can read our tokens without custom integration. The standard defines how credentials are structured, signed, and verified. We are building on an existing standard rather than inventing a proprietary format.

DID (Decentralized Identifier)

Our digital identity is did:web:opentrusttoken.com. This is a W3C standard for creating digital identities that don't depend on any single authority. Our DID document (at /.well-known/did.json) contains our public signing key. Any agent worldwide can look up our DID, find our public key, and verify our tokens without any prior relationship with us.

Tranco List

A research-grade ranking of the top 1 million websites by traffic, combining data from multiple sources (Cloudflare Radar, Cisco Umbrella, Majestic, Quantcast). More stable and resistant to manipulation than any single traffic ranking. We use it as a reputation signal (high traffic = established), as an identity signal (top-50K sites are definitionally who they claim to be), and as one of the four conditions gating the well-known brand anchor. The composition of "in the top 50K for 5+ years, with clean reputation and valid SSL" is what anchors automatic PROCEED for established public brands even when our crawler cannot reach their homepage.

DNS Blocklists

Spamhaus DBL, SURBL, and URLhaus maintain databases of domains involved in spam, phishing, and malware distribution. We query these via DNS lookup (instant, no API key needed). If a domain appears on any blocklist, it receives a critical flag that overrides the score to DENY regardless of other signals.