Common DNS Failure Modes and What They Look Like

DNS failures are often described with short error messages that hide a lot of complexity. Messages like SERVFAIL or a simple timeout do not point to a single cause. They are symptoms of problems somewhere along a distributed system that includes clients, recursive resolvers, authoritative servers, and the networks between them.

This article walks through common DNS failure modes and explains what they usually look like in practice, why they occur, and what parts of the DNS system are typically involved.

DNS rarely fails in one place

A DNS lookup is a chain of dependent steps. A failure can occur at any point in that chain, and the error that reaches the client is often a simplified summary.

A client typically sees only one of a few outcomes: a successful answer, an explicit error code like SERVFAIL or NXDOMAIN, or no answer at all, which appears as a timeout. Understanding DNS failures starts with recognizing that the visible error is often downstream from the real cause.

DNS resolution chain showing multiple potential failure points — A DNS query passes through multiple systems, any of which can introduce failure or delay.

SERVFAIL

SERVFAIL indicates that a DNS server failed to complete the query successfully. It does not mean the domain does not exist.

From the protocol perspective, SERVFAIL is a generic error. It tells the client that the server could not provide an answer, but not why.

Common underlying causes include:

DNSSEC validation failures
An authoritative server returning malformed responses
A resolver failing to reach an authoritative server
Internal resolver errors or resource exhaustion

RFC reference

DNS response codes, including SERVFAIL, are defined in RFC 1035, Section 4.1.1.

In real environments, DNSSEC is a frequent contributor to SERVFAIL. If a resolver cannot validate a signed response due to missing or incorrect signatures, it must fail the query rather than return potentially incorrect data.

From the client’s perspective, SERVFAIL often appears intermittent. One resolver may succeed while another fails, depending on cache state, validation settings, or reachability.

Timeouts

A timeout occurs when no response is received within the client or resolver’s configured wait period. Unlike SERVFAIL, a timeout does not come with an explicit DNS response code.

Timeouts usually indicate:

Packet loss or network filtering
An authoritative server that is slow or unreachable
A resolver under heavy load
MTU or fragmentation issues affecting DNS responses

Timeouts are especially common when UDP responses are large and require fallback to TCP. If TCP is blocked or delayed, the query may never complete.

Case example

An authoritative server responds with a large DNSSEC-signed answer. The resolver retries over TCP, but a firewall silently drops TCP port 53 traffic. The client experiences a timeout rather than an explicit error.

From an operational standpoint, timeouts are harder to diagnose than explicit errors because there is no response to inspect. Packet captures or resolver logs are often required to determine where the query stalled.

Stale data

Stale data occurs when DNS answers persist beyond their intended lifetime. This is usually related to caching behavior rather than outright failure.

Stale data can appear when TTL values are set too high, when negative caching persists after records are fixed, or when resolvers serve expired data under special conditions. Some recursive resolvers intentionally serve expired records for a short period if authoritative servers are unreachable. This behavior is not mandated by the DNS protocol but is implemented as a resilience feature.

Standards reference

RFC 8767 describes serving stale DNS answers to improve resilience during authoritative outages.

To users, stale data often looks like partial recovery. Some clients resolve to old IP addresses while others receive updated ones, depending on cache state and resolver behavior.

Partial outages

Partial DNS outages are situations where resolution works for some users, locations, or record types but fails for others.

These failures commonly involve anycast routing issues affecting specific regions, inconsistent zone data across authoritative servers, IPv6-only or IPv4-only failures, and split-horizon or conditional forwarding misconfigurations. Because DNS relies heavily on caching, partial outages can persist long after the underlying issue is fixed. Different resolvers may continue to serve different answers until their caches expire.

Partial outages are often misinterpreted as application bugs because they do not fail uniformly. From the outside, the system appears unreliable rather than completely down.

Why DNS failures are confusing by design

DNS prioritizes availability and simplicity of responses over detailed error reporting. The protocol was not designed to expose internal resolver state or detailed failure reasons to clients.

As a result, many distinct problems collapse into the same visible error, errors may be delayed or masked by caches, and recovery can be uneven across clients and networks.

Diagram showing how DNS errors are simplified as they propagate from source to client — Detailed failure information is often lost as errors propagate through the DNS resolution chain.

RFC reference

RFC 8914 defines Extended DNS Errors (EDE), which allow recursive resolvers to include more specific failure reasons, such as DNSSEC validation failure or unreachable authoritative servers, alongside traditional DNS response codes.

Support for Extended DNS Errors depends on both the resolver and the client. Many applications still surface only the high-level error, which means the underlying cause can remain opaque even when more detailed information exists.

This is not a flaw so much as a consequence of DNS being a distributed, cache-heavy system that must operate at global scale.

Summary

Common DNS failure modes share a few important traits: the visible error is often a symptom rather than the root cause, caching can both reduce impact and extend confusion, and partial and intermittent failures are normal in large DNS systems. Understanding what SERVFAIL, timeouts, stale data, and partial outages actually represent makes DNS issues easier to reason about. The key is to think in terms of systems and dependencies rather than single points of failure.

Content Filtering

Threat Protection

Device Protection

Network Visibility

About Us

FAQ

Contact

DNS Library

Blog

Videos

Documentation

Tools

Common DNS Failure Modes and What They Look Like

DNS rarely fails in one place

SERVFAIL

Timeouts

Stale data

Partial outages

Why DNS failures are confusing by design

Summary

More in DNS Infrastructure & Operations

Recursive vs Authoritative DNS: Who Answers What, and Why

Anycast and DNS: Why the Same IP Exists Everywhere