Signs of Triviality

Opinions, mostly my own, on the importance of being and other things.
[homepage]  [blog]  [jschauma@netmeister.org]  [@jschauma]  [RSS]

DNS Response Size

July 17th, 2022

Pop quiz: What is the maximum number of A records in a DNS round robin? Or the largest number of bytes in a TXT record? Maybe it's all the same, and we should ask what is the maximum size of a DNS response? Is it...

  • 512 bytes
  • 1232 bytes
  • 65536 bytes
  • "It depends."

Let's find out. The answer is, as all things involving the DNS, entertaining.

512 bytes?

Now just about every website on this here internet will tell you that the DNS uses UDP port 53, and that any response must fit into a single 512 byte UDP packet, and of course that answer is right. Except when it isn't. But let's start with that assumption and see how much data we can then fit into a single 512 byte response:

Most people care about A records, so let's create a DNS round-robin response with as many IPv4 addresses as will fit into a single 512-byte UDP response.

$ dig +noall +answer +comments @166.84.7.99 512.size.dns.netmeister.org a
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 34695
;; flags: qr aa rd; QUERY: 1, ANSWER: 28, AUTHORITY: 0, ADDITIONAL: 1

;; ANSWER SECTION:
512.size.dns.netmeister.org. 209 IN     A 127.0.0.2
512.size.dns.netmeister.org. 209 IN     A 127.0.0.10
512.size.dns.netmeister.org. 209 IN     A 127.0.0.19
512.size.dns.netmeister.org. 209 IN     A 127.0.0.18
512.size.dns.netmeister.org. 209 IN     A 127.0.0.23
512.size.dns.netmeister.org. 209 IN     A 127.0.0.1
[...]

This returned 28 A records. So far, so good. But 28 IPv4 addresses is only 28 * 4 bytes = 112 bytes. Shouldn't we have been able to add a whole bunch of IPv4 addresses more?

Let's take a look at what the packets actually look like, using tcpdump(1):

$ tcpdump -n -t -r /tmp/out.pcap
reading from PCAP-NG file /tmp/out.pcap
IP 172.16.1.22.62634 > 166.84.7.99.53: 31626+ [1au] A? 512.size.dns.netmeister.org. (56)
IP 166.84.7.99.53 > 172.16.1.22.62634: 31626 28/0/1 A 127.0.0.10, A 127.0.0.8,
        A 127.0.0.1, A 127.0.0.19, A 127.0.0.22, A 127.0.0.6, A 127.0.0.16,
        A 127.0.0.12, A 127.0.0.13, A 127.0.0.18, A 127.0.0.7, A 127.0.0.3,
        A 127.0.0.14, A 127.0.0.15, A 127.0.0.21, A 127.0.0.27, A 127.0.0.24,
        A 127.0.0.5, A 127.0.0.20, A 127.0.0.9, A 127.0.0.0, A 127.0.0.17,
        A 127.0.0.23, A 127.0.0.4, A 127.0.0.26, A 127.0.0.25, A 127.0.0.11,
        A 127.0.0.2 (504)

We easily see we are able to squeeze it all into a single UDP packet, which gives us the above mentioned 28 A records stuffed into 504 bytes.

Wait, 504 bytes? What happened to the 28 * 4 = 112 bytes calculation? Let's dig deeper, this time using Wireshark:

Wireshark for
512.size.dns.netmeister.org

Ok, so there's a bit more overhead in the DNS packet, where for every answer record we have at least 12 additional bytes. Compare to the wire format from RFC1035, DNS message (identical for query and response) on the left and the RR format on the right:

                     1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 3 3
 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| transaction id (2 bytes)      | flags (2 bytes)               |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| # of questions (2 bytes)      | # of answers (2 bytes)        |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| # of authority RRs (2 bytes)  | # of additional RRs (2 bytes) |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                                                               |
/                           questions                           /
|                                                               |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                                                               |
/                           answer RRs                          /
|                                                               |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                                                               |
/                         authority RRs                         /
|                                                               |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                                                               |
/                         additional RRs                        /
|                                                               |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
                     1 1 1 1 1 1
 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                               |
|                               |
/             name              /
|                               |
|                               |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|             type              |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|             class             |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|              ttl              |
|                               |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|            rdlength           |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                               |
|                               |
/             rdata             /
/                               /
|                               |
|                               |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

That is, every complete DNS response has:

  • 12 bytes DNS header
  • a few bytes for the query
  • for every A record:
    • 2 bytes name
    • 2 bytes type
    • 2 bytes class
    • 4 bytes ttl
    • 2 bytes rdlength
    • 4 bytes IPv4 address
  • possibly additional bytes for additional records

The query for 512.size.dns.netmeister.org comes out to strlen("512.size.dns.netmeister.org") + 1 byte NULL + 1 byte numlabels + 2 bytes type + 2 bytes class = 33 bytes; our DNS server also returns 11 bytes of additional records (more on that below), and each A record answer will be 16 bytes, so we get to 12 + 33 + (28 * 16) + 11 = 504 bytes.

So that's why we had 28 A records: if we had one more A record, we'd need to add 16 more bytes and then yield 520 bytes.

More than 512 bytes?

But what if we add more A records? Let's try to use a record that'd be just under 1024 bytes:

$ sudo tcpdump -w /tmp/out.pcap port 53 2>&/dev/null &
$ dig +short @166.84.7.99 1024.size.dns.netmeister.org a | wc -l
      60
$ fg
^C
$ tcpdump -t -r /tmp/out.pcap
IP 172.16.1.22.62299 > 166.84.7.99.53: 19287+ [1au] A? 1024.size.dns.netmeister.org. (57)
IP 166.84.7.99.53 > 172.16.1.22.62299: 19287*- 60/0/1 A 127.0.0.12, A 127.0.0.47, A 127.0.0.56,
        A 127.0.0.33, A 127.0.0.28, A 127.0.0.1, A 127.0.0.36, A 127.0.0.48,
        A 127.0.0.52, [...] 127.0.0.49, A 127.0.0.59 (1017)

Okay, so with 60 A records (1017 bytes) returned here, clearly we can have a response > 512 bytes and still get it back in a single UDP packet. But doesn't that contradict just about every lesson in your CS curriculum? How did we do that?

Enter EDNS(0), currently specified in RFC6891. Since a 512 byte UDP packet limit (which needs to account for all the other protocol headers) only allows for really small amounts of data, and since adding especially additional records such as those needed for DNSSEC to a response may increase the DNS result size, it is useful for clients to be able to tell a DNS resolver that it can actually accept more octets.

This is done via a pseudo-RR of type OPT. In our query, we can set this explicitly via the +bufsize=4096 option to dig(1); the result looks like so:

Wireshark for query for
1024.size.dns.netmeister.org, showing EDNS(0) OPT

Neat! So we should be able to get much larger results, right? Let's give that a try, querying for a response that fits into 2048 bytes, clearly well below the 4096 UDP payload size we advertized. Since we expect this to fit into a single UDP packet, we can pass +ignore to dig(1). But:

$ dig +ignore +bufsize=4096 +noall +comments @166.84.7.99 2048.size.dns.netmeister.org 
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 28512
;; flags: qr aa tc rd; QUERY: 1, ANSWER: 123, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 1232

We are not getting any results, and instead notice that our response has the tc flag set, indicating it was truncated. If we then repeat the same command without +ignore, we then see dig(1) dutifully repeat the query using TCP when it encounters the truncated bit:

Wireshark for query for
2048.size.dns.netmeister.org, showing retry via TCP

But why do we get a truncated response when we had asked for 4096 bytes payload size via EDNS(0)? Looking at the Additional RR, it seems the server had its UDP payload buffer size set to 1232 bytes. But what kind of whacky number is that?

Well, turns out it's not quite as arbitrary as it may seem. Given that we want to avoid packet fragmentation along the way from the resolver to the client, we are looking to establish a reasonable guess for the most common Maximum Transmission Unit (MTU) for the networks our packets might cross. That number is rather commonly 1500 bytes (in part because Ethernet II uses 1500 byte frames, and measurements done by researchers Axel Koolhaas and Tjeerd Slokker confirmed this size).

For IPv6 we have a mandated minimum size of 1280 bytes, so we are going to be conservative and use that. Subtracting the 40 bytes length of the IPv6 header and 8 bytes UDP header, we end up with 1232 bytes as the recommended default UDP payload size. And that is indeed the size widely implemented since DNS Flag Day 2020.

But what if we bump our UDP payload buffer size on the DNS server? With bind, that looks like so:

options {
        edns-udp-size 4096;
        max-udp-size 4096;
};

...and then we can indeed stuff the entire response into a single UDP packet:

Wireshark for
2048.size.dns.netmeister.org with EDNS0 4096

This packet capture also shows us why it might not be a good idea to set the EDNS UDP size so large: we can see that our packet did get fragmented in between the server and the client. If you look at the first answer packet, you should see the "more fragments" flag set - no good, since fragmented UDP DNS responses can be spoofed (see also)! So let's stick with 1232 as recommended.

65536 bytes?

Ok, so we can play around a bit with EDNS0 to avoid truncation and retry over TCP, but once we do retry using TCP, how many records can we return?

We saw that 2048.size.dns.netmeister.org returned 123 A records. Let's try incrementing the number of A records we return and see how many bytes we can transmit:

$ dig +short @166.84.7.99 256-a.size.dns.netmeister.org a | wc -l
     256
# 4156 bytes in total
$ dig +short @166.84.7.99 512-a.size.dns.netmeister.org a | wc -l
     512
# 8252 bytes in total
$ dig +short @166.84.7.99 1024-a.size.dns.netmeister.org a | wc -l
    1024
# 16445 bytes in total
$ dig +short @166.84.7.99 2048-a.size.dns.netmeister.org a | wc -l
    2048
# 32829 bytes in total
$ dig +short @166.84.7.99 4096-a.size.dns.netmeister.org a | wc -l
    0
$ 

Ok, so with all the overhead per record as well as for the response and additional records, we can at least get 2048 A records returned in just over 32K bytes. But why can't we do 4096 A records? Let's see what happens in the packet capture:

Wireshark for
4096.size.dns.netmeister.org showing TCP truncated

The first response via UDP is, no surprise, truncated, so we retry via TCP. But now the DNS result delivered via TCP is also truncated! That is, the DNS server has determined that the result will not fit into the maximum response size. Why is that?

Our payload is 4096 * 16 = 65536 bytes RDATA, which should fit into the DNS packet, which uses a two byte RDLENGTH field. But we also need to again account for the overhead noted above: 12 bytes DNS header, 36 bytes for the query, 11 bytes additional records, and 16 bytes for each A record, yielding (4096 * 16) + 12 + 36 + 11 = 65595 bytes in total. And the maximum size of an IP packet is 16 bit (via the IPv4 total length / IPv6 payload length). And that is now our limiting factor: 65536 bytes for DNS overhead + payload.

("What about jumbo frames?" That doesn't really buy us anything, either, because TCP itself is also limited to 16 bits, albeit semi-indirectly via the 16-bit urgent pointer, so we're still restricted to 65536 bytes there as well.)

So the maximum number of A records we can stuff into a response needs to be smaller than 65536 bytes to account for the overhead. Here, let's give this one a try:

$ dig +short @166.84.7.99 max.size.dns.netmeister.org a | wc -l
        4092
$ 

With 4092 A records and all overhead accounted for, this gives us a DNS response in 65530 bytes, with a cushy 6 bytes left to spare. So the answer to the question "How many A records can you put into a DNS round-robin?", depending on the length of the query, hovers somewhere below 4K, it seems. (Not that I'd advocate using such a large round-robin.)

TXT limits

But can we max out the full 64K theoretical size? Maybe using a TXT record? Let's try.

RFC1035 limits the size of a name in the DNS to 255 octets, so that doesn't seem to get us anywhere. But then again, we actually can have multiple names for a single RR, can't we? That is different from having multiple RRs of the same type. Compare:

$ dig +short @166.84.7.99 one.size.dns.netmeister.org txt
"one" "one"
$ dig +short @166.84.7.99 two.size.dns.netmeister.org txt
"two"
"one"
$ 

Note that in the first case we are getting back one DNS answer record with two strings, in the second, we get two answer records. (Compare the two wireshark captures: one, two) That is, we can create larger TXT records by using multiple names. How large? Let's see:

$ dig +short @166.84.7.99 txt510.size.dns.netmeister.org txt
"xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"
"xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"
$ dig +short @166.84.7.99 txt1020.size.dns.netmeister.org txt
"xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
[...]
$ dig +short @166.84.7.99 txt2040.size.dns.netmeister.org txt | tr -d '[" \n]' | wc -c
        2040
$ dig +short @166.84.7.99 txt4080.size.dns.netmeister.org txt | tr -d '[" \n]' | wc -c
        4080
$ dig +short @166.84.7.99 txt8160.size.dns.netmeister.org txt | tr -d '[" \n]' | wc -c
        8160
$ dig +short @166.84.7.99 txt16320.size.dns.netmeister.org txt | tr -d '[" \n]' | wc -c
       16320
$ dig +short @166.84.7.99 txt32640.size.dns.netmeister.org txt | tr -d '[" \n]' | wc -c
       32640
$ dig +short @166.84.7.99 max.size.dns.netmeister.org txt | tr -d '[" \n]' | wc -c
       65211
$ 

Doubling the payload at every step, things make sense up until 32640. Doubling again would yield 65280 bytes payload, which would again go over the 64K absolute max we discussed above. So we need to trim the last string and we can create the TXT record for max.size.dns.netmeister.org delivering 65211 bytes payload.

More TXT limits

So that's the observed limit for a single TXT record for this name, but what is the size limitation on multiple TXT records? If each TXT record is 255 bytes in length, and we account for the 13 bytes overhead per TXT record (2 bytes name + 2 bytes type + 2 bytes class + 4 bytes TTL + 2 bytes data length + 1 byte text length), then we can have 244 TXT records with 255 characters each (244 * (255 + 13)) plus 1 TXT record with 72 characters (72 + 13) plus the previously observed overhead:

$ dig +short @166.84.7.99 txts.size.dns.netmeister.org txt | wc -l
     245
$ 

But how many TXT records could we have if we put only a few bytes into each? Each TXT record must be unique, so for simplicity's sake I picked four bytes per TXT record, which then leads me to:

$ dig +short @166.84.7.99 smalltxts.size.dns.netmeister.org txt | wc -l
    3851
$ 

Other resolvers

Now one of the things in all of this above is that chances are that you'll observe different results if you query your local resolver. This is the beauty of troubleshooting the DNS: different people will observe different results based on a number of aspects of the network or the resolver in question, none of which are immediately obvious. This is why in all of the above dig(1) commands, I've always directly specified the authoritative nameserver responsible for the domain.

What kind of differences might we see, and why are there differences? Let's take a look:

To compare results from different public resolvers, I've previously put together a command-line tool and online service. If you try out e.g., lookup up the A records for 1024.size.dns.netmeister.org, most resolvers give you the same results, but for other queries you may see SERVFAIL and connection time outs.

To understand why that is, remember that our DNS server may return additional records under certain circumstances. Normally, additional records are dropped by the server if they do not fit into the response, but in some cases, the additional records are required: if you request DNSSEC validation in your query, then for every result, the DNS server needs to also serve the RRSIG record, which adds more bytes to the response.

So if the result is already near the max (65536 bytes), then adding a 119 byte additional RRSIG record will lead the server to truncating the reply even over TCP:

$ dig @166.84.7.99 max.size.dns.netmeister.org | wc -l
    4092
$ dig +dnssec +noall +comment @166.84.7.99 max.size.dns.netmeister.org 
;; Truncated, retrying in TCP mode.
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 749
;; flags: qr aa tc rd; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 1
;; WARNING: recursion requested but not available

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags: do; udp: 1232
$ 

This is why you'll see e.g., results from asking Google's public DNS resolvers return SERVFAIL or time out: they enable DNSSEC by default and thus receive a larger response than if you query directly without +dnssec. Likewise, some public resolvers may include an EDNS Client Subnet (ECS) option, which again adds additional bytes to the result.

You may also observe some enterprise software attempting to protect your device from malicious DNS lookups make flawed assumptions about DNS traffic over TCP or monkey around with EDNS(0) or otherwise interfere in uneducated ways. Middle-boxes and firewalls may make all sorts of mistakes here and drop packets. Good times, good times.


Summary

Alright, that was a fun little rabbit hole to fall down. But I think the main takeaways are that the often repeated mantra of "your DNS response needs to fit into the 512 byte UDP packet" is simply not -- or at least no longer -- a universal truth. Failover to TCP is at least theoretically understood as taking place by most engineers, but the actual maximum sizes, EDNS(0) and fragmentation, and how DNSSEC, label length, and possibly other factors such as e.g., ECS extensions influence the practical limit are far from obvious.

As always, the DNS is more fun than a barrel of monkeys. Although that depends on your definition of "fun". And your monkeys. But I dig(1) it, and remember, when in doubt: pcap or it didn't happen!

July 17th, 2022


Links:


Previous: [If Programming Languages Were Futurama Characters]
[homepage]  [blog]  [jschauma@netmeister.org]  [@jschauma]  [RSS]