Signs of Triviality

Opinions, mostly my own, on the importance of being and other things.
[homepage]  [blog]  []  [@jschauma]  [RSS]

The gTLDs' New Clothes - A Look at Centralization in Naked Domains

June 22nd, 2023

This is the fourth blog post on the topic of the centralization of the internet. The previous posts cover diversity of authoritative name servers, diversity of MX records, and an analysis of the use of CAA records across gTLDs.

This research was first presented at the APAC DNS Forum 2023 Pre-Event on June 20th, 2023. [slides]

'The naked emperor', stencil
graffiti by Edward von Lõngus. Kitsas street
in Tartu, Estonia.
Source: Ivo Kruusamägi; CC BY-SA 4.0

The DNS, aside from being inevitably the cause of whatever infrastructure problems you're encountering, is a treasure trove of data. As the foundation for so much of what happens on the internet, it serves as a good means to measure and assess the degree of centralization of the internet at large.

After having looked at NS, MX, and CAA records, I now analyzed the use of A and AAAA records making up the large set of so-called "naked domains". (The term "naked" or "bare" domain is generally used to refer to a second-level domain (, in contrast to a more qualified subdomain, such as e.g.,

Any second-level domain necessarily must have some resource records: at a minimum, there's the Start of Authority (SOA) record, as well as one or more NS records. For any domain that's DNSSEC signed, we also get a few additional records (RRSIG, DNSKEY, etc.), and the analysis of those might be another project to undertake some time.

Having A and AAAA records on the apex can be helpful as a fallback for SMTP servers to deliver mails to domains without an MX record. But the other, much more frequent use is in an HTTP context.

"The World Wide Web is the only thing I know of whose shortened form takes three times longer to say than what it's short for."
- Douglas Adams

Even though we've trained users well to type "", this is very annoying to do. And for most users, there really is no distinction between the World Wide Web and the Internet, so they will often leave off the "www" and simply enter "" into their browser and still expect to be taken to that company's website, which is why many domains want to add IP addresses to their second-level domain name and create such a "naked" domain.

Putting the possible complications relating to e.g., x509 certificates and wildcards, CDN load-balancing on the apex etc. aside, I once again dug into the gTLD zone files as well as miscellaneous ccTLD data and threw about 9 GB of compressed zone data at my little virtual private server running bind(9) to find out just how many domains are using which IP addresses at the apex.

A man facing a
water hose that is spraying water in his face.  The
hose is labeled 'Zone Data', the water stream 'A few
hundred million DNS lookups', and the man 'my little
VPS running bind'.

Dotless Domains

As I had noted when talking about email addresses, some TLDs are completely naked, as it were. By having A / AAAA or MX records, they are dotless domains. ICANN prohibits dotless domains, but nevertheless I found 12 TLDs with A or AAAA records:

ai, arab, cm, music, pn, tk, uz, va, ws, xn--l1acc, xn--mxtq1m, and xn--ngbrx. Of those, only va has an IPv6 address (and is in fact IPv6 only).

But back to the naked second-level domains...

IPv4 vs IPv6 usage

After parsing the zone data, I ended up with roughly 240 million domain names (166 million, or nearly 70%, of which are in .com alone) and looked at the distribution of IPv4 vs IPv6 addresses.

Now DNS data is necessarily always a snapshot in time, and during the lookup there are many things that can go wrong. As a result, I saw a number of NXDOMAIN and SERVFAIL results. Those aside, I found:

  • about 9% of all domains (24M) do not have either an A or AAAA record
  • roughly two thirds of all domains are IPv4 only
  • 8.5% of all domains (22.5M) are dualstack
  • roughly 65K domains are IPv6 only

A / AAAA records across 240 million domains
Pie chart showing the distribution
of IPv4 / IPv6 use as outlined above.

But not all domains are of equal importance, so in addition to running this analysis for all domains, I also ran it for the Top 1 million domains only, as identified by the Tranco list. There, things are slightly better with respect to IPv6 support: among the most popular one million domains, we have almost a quarter of them supporting IPv6:

A / AAAA records across the Top 1M Domains
Pie chart showing the distribution
of IPv4 / IPv6 use as outlined above.

CNAMES at the apex

One thing that's not shown in the two charts above is that aside from A and AAAA records, there are also a lot of naked domains that are CNAMEs. That is, the actual second-level domain has a CNAME record that may then eventually resolve to an IPv4 anđ/or IPv6 address.

Now having a CNAME record at the apex is of course against RFC1034, because as a second-level domain, there must already be at least SOA and NS records, and RFC1034 says that thou shalt not have a CNAME in the presence of any other resource records. (Except, of course, there's also RFC4034 and DNSSEC ruins everything by saying, no, wait a second, you can have other RRs next to a CNAME, because otherwise DNSSEC won't work, but those are the only exceptions, pinky promise, so anyway, you still can't have a CNAME at the apex.)

Only... many organizations really want that precisely because users like to leave off the www when entering a domain in their browser. In total, I found that about 3.8% (9.2M) of all domains and 9K of the Top 1M domains have a CNAME record at the apex, and while that violates RFC1034, by and large the world doesn't end because most systems are still tolerant in what they accept. Looking at what these CNAMEs point to, we find them to cluster in around 200K different domains:

Apex CNAMEs by canonical domain
Pie chart showing the distribution
of CNAME domains: 56.4%, 5.6%, 2.5%, 2.4

Notably, 54% of all CNAMEs (4.9M) point to a single name (, with other sizable representation by different domain name registrars, domain name monetization companies etc.

IP count

But let's go back to IP addresses. NXDOMAIN, SERVFAIL and NODATA aside, I found 84% of all domains (around 201M) did have at least one IP address. And of course it's not uncommon for a domain to have more than one IP address. For the small number of domains that are dualstack, it's quite common to have, an equal number of IPv4 and IPv6 addresses:

  • 59% (119M) have exactly 1 IP address
  • 22% (44M) have 2 addresses
  • 7% (14M) have 4 addresses (e.g., 2 IPv4 + 2 IPv6)
  • 236 domains have >100 addresses

Having a domain map to over 100 IP addresses may seem excessive, but when you make a few hundred million DNS lookups, you find all sorts of funny things:

Terminal window showing the large
number of results from running 'host'
scroll by, 1848 in total.

When I collected data and ran the DNS lookups, the domain had 2024 IP addresses in total (912 IPv4, 1112 IPv6), had 1433 total (896 IPv4, 537 IPv6), and had 777 total (228 IPv4, 532 IPv6). That's an interesting attempt to implement DNS based round-robin load balancing, I suppose (and which don't seem to consider the various failure modes of the gigantic DNS response size these packets incur).

Reserved Addresses

Oh, and talking about finding funny things in the data, guess what? There are over 668,000 domains that use reserved IP addresses:

XKCD stickfigures around a
campfire, narrator saying 'But when she traced the
killer's IP address... it was in the 192.168/16

  • 562K use (most, but over 300 others appear; 2.1K use ::1)
  • 35K use RFC1918 / RFC4193 addresses, most commonly,,; some fd00::/8 and even some fc00::/8
  • 31K use, most commonly
  • 1.5K use RFC6598 shared address space, most commonly
  • 10K use a link-local address, fe80::/64, most commonly
  • 1.8K use documentation IPs (e.g.,,, 2001:db8::/32)
  • 2.1K use
  • 570 use ::
  • 341 use

Top IPs

Ok, all the funny business aside, let's talk about which IP addresses are encountered most frequently:

All in all, I found 328 million A and AAAA records across 13.7 million unique IP addresses. This of course implies that many domains share an IP address: they resolve to the same A or AAAA addresses. And some IP addresses are found more often than others:

  • 201 IPs are used by more than 100K domains each
  • 64 IPs are used by more than 500K domains each
  • 29 IPs are used by more than 1M domains each
  • Top 5 most used IPs:
    1. (25M domains) —
    2. (9.8M domains) —
    3. (9.8M domains) —
    4. (6M domains) -
    5. *6M domains) -

In addition, it's worth noting that all of the top 20 most frequently used IP addresses are IPv4 addresses and that there only 8 IPv6 addresses in the top 100 most frequently seen IPs.

For the Top 1M domains, I found about 1.8 million A and AAAA records mapping to around 700,000 unique IPs, with 50 IPs being used by more than 1,000 domains each, 6 IPs being used by more than 4,000 domains each, and the top 5 most used IPs being:

  1. (7487 domains) —
  2. (5804 domains) —
  3. (4726 domains) —
  4. (4688 domains) —
  5. (4654 domains) —

Now obviously the top IP addresses here belong to some of the same organization, but it seemed like it might be interesting to see how the IP addresses reverse to identify controlling domains, and so I went ahead and tortured my DNS resolver a little bit more...

Reversed IPs

Now a little over half of the IP addresses found don't have a PTR record at all or fail resolution. The remaining 6 million addresses do have PTR records; for those, I found 30,000 records that are CNAMEs, meaning they follow RFC2317 to allow for delegation on a non-octet boundary.

Another thing that's perhaps not entirely obvious is that a single IP address can reverse to multiple domains: I found over 500 IPs that reversed to over 100 domains each. The IP addresses reversing to the largest number of domains were:

  1. (reverses to 3.2K domains), used by 23 domains
  2. (2.8K), used by 14 domains
  3. (2.4K), used by 10 domains
  4. (2.3K), used by 25 domains
  5. (2.1K), used by 674 domains

What I think is interesting here is that there are some IPs that have a much, much larger number of PTR records than they are currently used for, hinting at a possible use for domain name parking and reuse with lacking cleanup when the domain is re-sold, expired, or moved.

Looking at the names to which most IP addresses reverse, we see again concentration across common providers, notably Amazon AWS here, although I thought that having PTR records pointing to the root (.)was also interesting:

  1. (16K IPs)
  2. (15.5K IPs)
  3. (7.5K IPs)
  4. (5K IPs)
  5. . (5.7K IPs)

Noticing the shared domain names here, I then went ahead and normalized all the reversed FQDNs to add up the distribution into second-level domains:

Reversed domains of unique IP PTR records
Pie chart showing the distribution
of unique IP PTR records: 32%, 6.8%, 5.9%, 4.3%

We see about 32% of all IPs that do have PTR records point to, almost 7% pointing to, 6% to (owned by GoDaddy / Wild West Domains) and so on.

But these are unique IPs; it seems that we should weight the IPs that are used by more domains differently from those that are used by only one domain. When taking this into consideration, we quickly find the most widely used providers: almost 100 million IP addresses added up by frequency of use reverse into only 20 hostnames, with the top 5 being:

  1. (25.4M)
  2. (19.7M)
  3. (18M)
  4. (6.5M)
  5. (3.4M)

(It looks like had a typo in its name but probably couldn't go and fix it once it was so widely in use. Remember kids, domain names are forever!)

If we again sum up the stats of the reversed names by domain name, instead of FQDN, we see an ever starker concentration: 37% of cumulatively encountered IPs (121M) reverse into only 5 different domains:

  1. (33.6M)
  2. (32.5M)
  3. (21M)
  4. (18M)
  5. (16M)

Oh, and of course there's also 554K IPs that reverse to localhost, but let's stay focused on the domain names: looking at the above top 5, you should quickly notice that of course and are the same entity, as are and

This got me thinking as to how I can better identify IP address space concentration in use here. Visual inspection by domain name is not going to work well, and WHOIS data these days is often unusable, so instead I went ahead and mapped the IP addresses to their autonomous system numbers.

IPs by AS

The IP addresses of the top 1 million most frequently encountered IPs map into around 12,000 different AS, but of course here, too, we can see a high degree of concentration, as 48.5% map into just 20 AS, with the top 5 being:

  1. AS13335 (CLOUDFLARENET, US) (120K IPs)
  2. AS16509 (AMAZON-02, US) (45K IPs)
  3. AS47583 (AS-HOSTINGER, CY) (34K IPs)
  4. AS26347 (DREAMHOST-AS, US) (31K IPs)
  5. AS16276 (OVH, FR) (28K IPs)

(OVH likely shows prominently here because fr. is one of the few ccTLDs for which the list of domain names is available.)

If we then look at the 328 million cumulatively encountered A / AAAA records, we find nearly 50% mapping into just 5 AS:

  1. AS16509 (AMAZON-02, US) (52.4M, 16%)
  2. AS13335 (CLOUDFLARENET, US) (40M, 12%)
  3. AS396982 (GOOGLE-CLOUD-PLATFORM, US) (31M, 9.5%)
  4. AS15169 (GOOGLE, US) (19M, 5.8%)
  5. AS58182 (WIX_COM, IL) (18M, 5.5%)

As before, here we can also again identify distinct AS owned by the same company (e.g., AS396982 and AS15169) and add them up, which then gives us this picture:

Top AS by cumulative A / AAAA records' PTR records
Pie chart showing the distribution
of top AS: 18.5% Amazon, 15.3% Google, 12.2%
Cloudflare, 5.5% Wix

Key Findings

Alright, that's a whole lot of numbers and pie charts. Let's try to summarize and see what this data is telling us.

For starters - and this may not be surprising, but still disappointing -- we find that we're still very much in an IPv4 only world: almost three quarters of all domains are v4 only.

We also saw that there's a clear demand for CNAMEs at the apex. This is something that especially Content Delivery Networks (CDNs) have to deal with, as they often rely on CNAMEs to effectively load balance traffic across their edge networks.

Akamai, for example, offers a feature called Zone Apex Mapping, Cloudflare uses "CNAME flattening", and Amazon Route53 offers so-called alias records, to give some examples. In my opinion, there's an opportunity for standardization of a workable solution here. (SVCB / HTTPS DNS records, as specified in this draft are a good candidate here.)

But with respect to the question of whether the internet is (effectively) centralized, we did note that:

  • a single IP address is used by over 10% of all domains, which I think is impressive
  • 37% of cumulatively encountered IPs reverse into 5 domains owned by 3 companies (AWS, Google, Wix)
  • those same companies -- Amazon, Google, Cloudflare, and Wix -- control over 50% of the most frequently encountered IP addresses

However, it is important to note that the data used for this research is incomplete: the inability to get access to the ccTLD zones means that our data is necessarily skewed towards the US market. For example, I would expect there to be a significant weight shift if I had been able to analyze the .CN, .JP, and .DE zones, to name just a few examples.

Secondly, what I have presented here is only a snapshot in time. It might be interesting to track these findings over a longer timeframe to see whether or not we are moving towards an increasingly centralized internet or not.

June 22nd, 2023


Previous: [Whose Cert Is It Anyway?]  -- Next: [TLD Domain Count Stats]
[homepage]  [blog]  []  [@jschauma]  [RSS]