Signs of Triviality

Opinions, mostly my own, on the importance of being and other things.
[homepage]  [blog]  [jschauma@netmeister.org]  [@jschauma]  [RSS]

Who controls the internet?

A look at diversity of authoritative NS records in gTLDs

November 15th, 2022

Modified XKCD 2347 noting the DNS as the main
dependency This is a blog post version of a talk I gave at the 5th ICANN DNS Symposium.

Why yes, the internet is resting on a foundation of duct tape and WD40 - it is known. And the DNS is the mother of all corner stones that, if knocked out, would quickly lead to the fall of western civilization. (And yes, it is a hard requirement to use this XKCD cartoon to illustrate this.) But at least it's not quite as fragile as, say, whois, so yay!

But while the DNS root servers are known to be distributed, I thought it might be interesting to take a closer look at the immediate levels up from the root, and so I went to analyze the diversity or centralization of the authoritative nameservers for the generic top-level domains (gTLDs) and the second-level domains in those gTLDs.

To perform this analysis, I started out with the root zone, which (as of November 2022) contains 1485 TLDs. As I discussed previously, just what exactly you find in there is already utterly fascinating, but for our purposes here, let's note that you can then request access to all of the gTLD zone files via ICANN's Centralized Zone Data Service, which got me access to 1,165 zones in total. In addition, you can obtain the .gov zone from CISA's GitHub repository, as well as .arpa from most of the root servers:

$ dig +noall +answer +onesoa @f.root-servers.net arpa. AXFR | more
arpa.                   86400   IN      SOA a.root-servers.net. nstld.verisign-grs.com. 2022111501 1800 900 604800 86400
arpa.                   518400  IN      NS        m.ns.arpa.
arpa.                   518400  IN      NS        c.ns.arpa.
arpa.                   518400  IN      NS        f.ns.arpa.
[...]

This leaves us missing the .edu, .int, .mil and .post TLDs, which are not generally available. (If you know how to get access, please let me know.)

For the country-code specific top-level domains (ccTLDs), it's a lot more difficult to gain access: most operators do not provide public access, although some do: you can AXFR some of them or gather some published data from others. Commercial services exist that sell you zone data, but it seems to me that this data ought to be public, so I excluded ccTLDs from my analysis for the time being.

Anyway, so with 1,168 total zone files adding up to around 7GB of data (of which the .com zone accounts for 4.8 GB alone!), I went ahead and used a variety of shell scripts and some perl glue to parse out the NS records to then see just what domains those are in, i.e., who controls them.

The Root

The DNS root zone itself is served by 13 root authorities, and as such is obviously and trivially diverse. The 13 authorities are managed by twelve root operators: 9 US organizations (including three US government entities), of which one (Verisign) operates two roots, one Swedish company (Netnod), one organization in Japan (WIDE), and one headquartered in the Netherlands (RIPE NCC). Obviously, all are in the same domain (i.e., root-servers.net):

Pie
chart showing the even distribution of the 13 root
authorities Pie
chart showing a full circle for root nameserver
domains
NS records for the root zone Domains containing those NS records

Now for the root itself, this illustration is of course a bit silly, but it gives you an idea of what I'm looking for in this analysis. And things do get a bit more interesting once we process all the NS records from the root zone itself, where we find 7,507 total NS records across 5,612 unique name servers, which looks reasonably diverse:

$ awk '/IN[       ]*NS[   ]/ { print $NF }' root.zone | wc -l
    7507
$ awk '/IN[       ]*NS[   ]/ { print $NF }' root.zone | sort | uniq -c | wc -l
    5625
$ awk '/IN[       ]*NS[   ]/ { print $NF }' root.zone | sort | uniq -c | sort -rn | head -20
 119 ac4.nstld.com.
 119 ac3.nstld.com.
 118 ac2.nstld.com.
 118 ac1.nstld.com.
  47 l.gmoregistry.net.
  47 k.gmoregistry.net.
  47 b.gmoregistry.net.
  47 a.gmoregistry.net.
  46 ns-tld5.charlestonroadregistry.com.
  46 ns-tld4.charlestonroadregistry.com.
  46 ns-tld3.charlestonroadregistry.com.
  46 ns-tld2.charlestonroadregistry.com.
  46 ns-tld1.charlestonroadregistry.com.
  27 anycast9.irondns.net.
  27 anycast24.irondns.net.
  27 anycast23.irondns.net.
  27 anycast10.irondns.net.
  21 j.zdnscloud.com.
  21 i.zdnscloud.com.
  21 g.zdnscloud.com.
$ 

But if you look closer, you'll notice that many of the nameservers are in the same domain, so if we then flatten the whole thing, we see a bit more of a centralization. For example, 6.3% of the NS records being under nstld.com, which is operated by Verisign:

Pie
chart showing domains of NS records in the root zone

But thinking about this distribution a bit more quickly makes you realize that there isn't really an even distribution in the gTLD, since not all domains have the same footprint. As you may guess, the .com zone has more records than some of the other zones. More specifically, .com has over 164 million NS records, making up 73% of all the NS records in all the gTLDs.

Pie
chart showing the number of NS records in all gTLDs

The NS records for .com are in the gtld-servers.net domain, but so are e.g., .net's; similarly, the NS records for .org and .info are in the same domain, so we can flatten this data a little bit more:

Pie
chart showing the number of NS records in all gTLDs
flattened by domain

In other words, almost 80% of all NS records across all gTLDs are under the gtld-servers.net domain, and thus the control of Verisign -- the same Verisign that also operates two roots.

Ok, so this is the representation of the NS records for the gTLDs within the root zone, but what about the NS records for all the second-level domains within the gTLDs? Parsing all 1,168 zone files, we end up with 2,699,827 unique name servers that we can group under 1,063,092 domains:

Pie
chart showing the number of NS records across all gTLDs
flattened by domain

This shows a notable centralization of the NS records found in all gTLD zones, with domaincontrol.com accounting for roughly 20% alone.

Another thing that seems interesting here is that some of the cloud companies offering DNS services are choosing to use a larger number of NS records even across, in the case of AWS, thousands of second-level domains in several TLDs:

$ grep awsdns- domain-counts.full | head
52221 awsdns-02.org.
49614 awsdns-23.net.
49264 awsdns-49.com.
48276 awsdns-05.co.uk.
46392 awsdns-35.org.
45955 awsdns-53.com.
45593 awsdns-19.net.
44409 awsdns-25.com.
44176 awsdns-22.co.uk.
44140 awsdns-45.org.
$ grep -c awsdns- domain-counts.full 
978

The data now show that out of the over 534 million NS records across a little over 1 million domains:

  • 43% of all NS records (roughly 230 million) are served by only 165 name servers found in just 10 domains
  • 52% (~ 278 million) are served by 255 name servers in just 20 domains
  • 75% (~ 401 million) are served by 1,580 name servers in just 100 domains
  • 99% (~ 529 million) are served by 345,000 name servers in 6,000 domains

Let's look at these 20 domains and see who controls them, and thus over half of all the domains in all the gTLDs. They are:

  1. domaincontrol.com is Wild West Domains GoDaddy 🇺🇸 (AS44273)
  2. googledomains.com is Google 🇺🇸 (AS15169)
  3. cloudflare.com is Cloudflare 🇺🇸 (AS13335)
  4. ui-dns.* is IONOS United Internet AG 🇩🇪 (AS8560)
  5. registrar-servers.com is Namecheap 🇺🇸 (AS397213)
  6. wixdns.net is Wix.com Ltd. 🇮🇱 (AS15169)
  7. hichina.com is Alibaba Cloud Computing 🇨🇳 (AS37963)
  8. dns.com is Comodo Xcitum 🇺🇸 (AS21859, AS133775 🇨🇳)
  9. awsdns-* is Amazon Web Services 🇺🇸 (AS16509)
  10. nsone.net is NS1 🇺🇸 (AS62597)
  1. namebrightdns.com is NameBright Turn Commerce Inc. 🇺🇸 (AS14618)
  2. gname-dns.com is Gname 🇸🇬 (AS13335)
  3. name-services.com is Enom 🇺🇸 / 🇨🇦 (AS15348)
  4. dnsowl.com is NameSilo 🇺🇸 (AS13335)
  5. squarespacedns.com is Squarespace 🇺🇸 (AS63911)
  6. worldnic.com is Network Solutions LLC 🇺🇸 (AS13335)
  7. bluehost.com is Newfold Digital, Inc 🇺🇸 (AS13335)
  8. name.com is Donuts Inc Identity Digital 🇺🇸 (AS62597)
  9. myhostadmin.net 🇨🇳 (AS38283)
  10. wordpress.(org|com) is Automattic Inc. 🇺🇸 (AS2635)

You may notice that of these 20 organizations, 15 are US entities, 2 Chinese, 1 German, 1 Israeli, and one from Singapore, giving you an idea what governments could -- in theory, at least -- exert control over what percentage of the internet.

Another interesting thing to point out here is that even though the domains are registered by different organizations, the name servers in use may actually be operated from a different entity's networks. In particular, it looks like several of the name servers in these domains are running out of, fronted by, or otherwise utilizing Cloudflare's network, while Wix seems to be using Google Cloud (I'm guessing) to run their name servers.

name.com is owned by Identity Digital, the rebranding of the merged Donuts and Afilias (previously discussed here) registries, which also operates a significant number of TLD domains.

All in all a sign that perhaps we should take a look at the Autonomous System (AS) numbers the various name servers are in, and so, a few thousand lookups later:

Pie chart showing the diversity of AS
numbers for the NS records across all gTLDs

That's right: around 34% of the majority of NS records are resolving to IP addresses in Cloudflare's AS13335, and over half of all are ultimately served from only four Autonomous Systems: Cloudflare (AS13335), Alibaba (AS37963), GoDaddy (AS44273), and IONOS (AS8560) (hinting at the other big load-bearing infrastructure pillar that also remains largely insecure by default.

And while that is interesting by itself, just as before when we looked at the name servers serving the gTLD domains themselves and we tried to weigh them against how many domains they support, perhaps we should also look at not only the NS diversity in the raw gTLDs; after all, control of google.com or facebook.com surely counts more than, say, monkeyjungle.com.

So what do people do when they want to look at popular domains? They go for the "Alexa Top 1 Million Domains" list, of course! Only... Alexa was bought by Amazon, and in a sign of "who controls the internet", Amazon promptly shut it down. (As of November 8th, 2022, the actual list was still available, but it looks like it has since been restricted.) Of course there are other, similar lists (like e.g., the Cisco Umbrella or the Majestic Million), all of which intersect to some degree but remain distinct based on the heuristics used by the data collection mechanisms used. For this reason, researchers provide a normalized Top 1 Million list (see their paper for more details), which I've used for this project here.

Iterating over that full list and looking up the NS records for 1 million domains then yields a breakdown of 2,636,294 total NS records in 119,291 domains, as well as the insight that spreadsheets are surprisingly bad at handling large data sets even of simple text data:

Pie chart showing the diversity of NS
record domains of the top 1 million domains

So we see a very similar distribution to our analysis of all of the NS records in all of the gTLDs here in the top 1 million domains, too: More than half of the NS records used by the top one million domains are found in just 20 of the 120K domains, served by only 1,740 NS records.

The top ten NS record domains is represented by the usual suspects (Cloudflare, Amazon, GoDaddy, Akamai, DigiCert, Google, Microsoft, Alibaba, Network Solutions, and Namecheap), although not identical to those we observed for all of the gTLD records.

Also noteworthy is that the distribution across NS domains shifts somewhat when you look at the top 100 domains (Azure, AWS, Google, Akamai), the top 1,000 domains (AWS, Akamai, NS1, Google), the top 10K domains (AWS, AKamai, Cloudflare, NS1) and the full top 1 million (Cloudflare, Amazon, GoDaddy, Akamai), suggesting that more of the less popular sites use Cloudflare than do the higher ranked sites.

At the same time, when we do the same breakdown by AS number as before (with many thanks to our friends at Team Cymru), we notice an even increased centralization:

Pie chart showing the diversity of NS
record domains of the top 1 million domains by AS

Out of almost 10,000 IP addresses covering 75% of the top one million domains' NS records, over 40% again land in Cloudflare's AS13335, with most of the others being mere "also-ran"s.

Ok, so that's a whole lot of pie charts, and learning that there is indeed a fair bit of centralization at the gTLD level of the DNS will not come as a surprise to many. However, crunching those numbers still provides for some useful insights. So if we wanted to answer the question "Who controls the internet?", then I think that we may find multiple answers:

1. Verisign -- In addition to operating two of the DNS root authorities, Verisign also controls the gtld-servers.net domain, which we've seen above is home to a whopping 80% of all gTLD NS records! Take out Verisign, and the internet's going to have a bad day.

2. A handful of large companies -- i.e., the usual suspects. With 43% of all NS records in all gTLDs and 44% of those in the Top 1M in a combined 14 domains, any one of those could exert significant control over large chunks of the internet. But amongst those companies, a few stand out:

3. GoDaddy -- owner of the aptly named domaincontrol.com domain is responsible for 20% of all NS records in all gTLDs alone.

4. Cloudflare -- responsible for 20% of NS records in the top one million domains, Cloudflare also provides the IP space home to a total 40% of those NS records.

What this centralization means in practice and whether, for example, the US government could realistically exert control over the root operators and companies discussed here, is a different story altogether. But no matter how you look at it, the internet seems less distributed or decentralized than one might wish, as many businesses and organizations appear to concentrate in a handful of registries and cloud service providers.

We don't have a single point of failure just yet, but I do see multiple points of calamity with increasing blast radius...

November 15th, 2022


Links:


Previous: [Time is an illusion, Unix time doubly so...]
[homepage]  [blog]  [jschauma@netmeister.org]  [@jschauma]  [RSS]