June 22nd, 2023 This is the fourth blog post on the topic of the centralization of the internet. The previous posts cover diversity of authoritative name servers, diversity of MX records, and an analysis of the use of CAA records across gTLDs. This research was first presented at the APAC DNS Forum 2023 Pre-Event on June 20th, 2023. [slides] ![]() The DNS, aside from being inevitably the cause of whatever infrastructure problems you're encountering, is a treasure trove of data. As the foundation for so much of what happens on the internet, it serves as a good means to measure and assess the degree of centralization of the internet at large. After having looked at NS, MX, and CAA records, I now analyzed the use of A and AAAA records making up the large set of so-called "naked domains". (The term "naked" or "bare" domain is generally used to refer to a second-level domain (example.com), in contrast to a more qualified subdomain, such as e.g., www.example.com.) Any second-level domain necessarily must have some resource records: at a minimum, there's the Start of Authority (SOA) record, as well as one or more NS records. For any domain that's DNSSEC signed, we also get a few additional records (RRSIG, DNSKEY, etc.), and the analysis of those might be another project to undertake some time. Having A and AAAA records on the apex can be helpful as a fallback for SMTP servers to deliver mails to domains without an MX record. But the other, much more frequent use is in an HTTP context. "The World Wide Web is the only thing I know of whose shortened form takes three times longer to say than what it's short for." Even though we've trained users well to type "www.example.com", this is very annoying to do. And for most users, there really is no distinction between the World Wide Web and the Internet, so they will often leave off the "www" and simply enter "example.com" into their browser and still expect to be taken to that company's website, which is why many domains want to add IP addresses to their second-level domain name and create such a "naked" domain. Putting the possible complications relating to e.g., x509 certificates and wildcards, CDN load-balancing on the apex etc. aside, I once again dug into the gTLD zone files as well as miscellaneous ccTLD data and threw about 9 GB of compressed zone data at my little virtual private server running bind(9) to find out just how many domains are using which IP addresses at the apex. ![]() Dotless DomainsAs I had noted when talking about email addresses, some TLDs are completely naked, as it were. By having A / AAAA or MX records, they are dotless domains. ICANN prohibits dotless domains, but nevertheless I found 12 TLDs with A or AAAA records: ai, arab, cm, music, pn, tk, uz, va, ws, xn--l1acc, xn--mxtq1m, and xn--ngbrx. Of those, only va has an IPv6 address (and is in fact IPv6 only). But back to the naked second-level domains... IPv4 vs IPv6 usageAfter parsing the zone data, I ended up with roughly 240 million domain names (166 million, or nearly 70%, of which are in .com alone) and looked at the distribution of IPv4 vs IPv6 addresses. Now DNS data is necessarily always a snapshot in time, and during the lookup there are many things that can go wrong. As a result, I saw a number of NXDOMAIN and SERVFAIL results. Those aside, I found:
![]() But not all domains are of equal importance, so in addition to running this analysis for all domains, I also ran it for the Top 1 million domains only, as identified by the Tranco list. There, things are slightly better with respect to IPv6 support: among the most popular one million domains, we have almost a quarter of them supporting IPv6: ![]() CNAMES at the apexOne thing that's not shown in the two charts above is that aside from A and AAAA records, there are also a lot of naked domains that are CNAMEs. That is, the actual second-level domain has a CNAME record that may then eventually resolve to an IPv4 anđ/or IPv6 address. Now having a CNAME record at the apex is of course against RFC1034, because as a second-level domain, there must already be at least SOA and NS records, and RFC1034 says that thou shalt not have a CNAME in the presence of any other resource records. (Except, of course, there's also RFC4034 and DNSSEC ruins everything by saying, no, wait a second, you can have other RRs next to a CNAME, because otherwise DNSSEC won't work, but those are the only exceptions, pinky promise, so anyway, you still can't have a CNAME at the apex.) Only... many organizations really want that precisely because users like to leave off the www when entering a domain in their browser. In total, I found that about 3.8% (9.2M) of all domains and 9K of the Top 1M domains have a CNAME record at the apex, and while that violates RFC1034, by and large the world doesn't end because most systems are still tolerant in what they accept. Looking at what these CNAMEs point to, we find them to cluster in around 200K different domains: ![]() Notably, 54% of all CNAMEs (4.9M) point to a single name (traff-1.hugedomains.com), with other sizable representation by different domain name registrars, domain name monetization companies etc. IP countBut let's go back to IP addresses. NXDOMAIN, SERVFAIL and NODATA aside, I found 84% of all domains (around 201M) did have at least one IP address. And of course it's not uncommon for a domain to have more than one IP address. For the small number of domains that are dualstack, it's quite common to have, an equal number of IPv4 and IPv6 addresses:
Having a domain map to over 100 IP addresses may seem excessive, but when you make a few hundred million DNS lookups, you find all sorts of funny things: ![]() When I collected data and ran the DNS lookups, the domain vannaoh.com had 2024 IP addresses in total (912 IPv4, 1112 IPv6), micahclay.us had 1433 total (896 IPv4, 537 IPv6), and kejanigarage.com had 777 total (228 IPv4, 532 IPv6). That's an interesting attempt to implement DNS based round-robin load balancing, I suppose (and which don't seem to consider the various failure modes of the gigantic DNS response size these packets incur). Reserved AddressesOh, and talking about finding funny things in the data, guess what? There are over 668,000 domains that use reserved IP addresses: ![]()
Top IPsOk, all the funny business aside, let's talk about which IP addresses are encountered most frequently: All in all, I found 328 million A and AAAA records across 13.7 million unique IP addresses. This of course implies that many domains share an IP address: they resolve to the same A or AAAA addresses. And some IP addresses are found more often than others:
In addition, it's worth noting that all of the top 20 most frequently used IP addresses are IPv4 addresses and that there only 8 IPv6 addresses in the top 100 most frequently seen IPs. For the Top 1M domains, I found about 1.8 million A and AAAA records mapping to around 700,000 unique IPs, with 50 IPs being used by more than 1,000 domains each, 6 IPs being used by more than 4,000 domains each, and the top 5 most used IPs being:
Now obviously the top IP addresses here belong to some of the same organization, but it seemed like it might be interesting to see how the IP addresses reverse to identify controlling domains, and so I went ahead and tortured my DNS resolver a little bit more... Reversed IPsNow a little over half of the IP addresses found don't have a PTR record at all or fail resolution. The remaining 6 million addresses do have PTR records; for those, I found 30,000 in-addr.arpa records that are CNAMEs, meaning they follow RFC2317 to allow for in-addr.arpa delegation on a non-octet boundary. Another thing that's perhaps not entirely obvious is that a single IP address can reverse to multiple domains: I found over 500 IPs that reversed to over 100 domains each. The IP addresses reversing to the largest number of domains were:
What I think is interesting here is that there are some IPs that have a much, much larger number of PTR records than they are currently used for, hinting at a possible use for domain name parking and reuse with lacking cleanup when the domain is re-sold, expired, or moved. Looking at the names to which most IP addresses reverse, we see again concentration across common providers, notably Amazon AWS here, although I thought that having PTR records pointing to the root (.)was also interesting:
Noticing the shared domain names here, I then went ahead and normalized all the reversed FQDNs to add up the distribution into second-level domains: ![]() We see about 32% of all IPs that do have PTR records point to amazonaws.com, almost 7% pointing to googleusercontent.com, 6% to secureserver.net (owned by GoDaddy / Wild West Domains) and so on. But these are unique IPs; it seems that we should weight the IPs that are used by more domains differently from those that are used by only one domain. When taking this into consideration, we quickly find the most widely used providers: almost 100 million IP addresses added up by frequency of use reverse into only 20 hostnames, with the top 5 being:
(It looks like wixsite.com had a typo in its name but probably couldn't go and fix it once it was so widely in use. Remember kids, domain names are forever!) If we again sum up the stats of the reversed names by domain name, instead of FQDN, we see an ever starker concentration: 37% of cumulatively encountered IPs (121M) reverse into only 5 different domains:
Oh, and of course there's also 554K IPs that reverse to localhost, but let's stay focused on the domain names: looking at the above top 5, you should quickly notice that of course awsaccelerator.com and amazonaws.com are the same entity, as are googleusercontent.com and 1e100.net. This got me thinking as to how I can better identify IP address space concentration in use here. Visual inspection by domain name is not going to work well, and WHOIS data these days is often unusable, so instead I went ahead and mapped the IP addresses to their autonomous system numbers. IPs by ASThe IP addresses of the top 1 million most frequently encountered IPs map into around 12,000 different AS, but of course here, too, we can see a high degree of concentration, as 48.5% map into just 20 AS, with the top 5 being:
(OVH likely shows prominently here because fr. is one of the few ccTLDs for which the list of domain names is available.) If we then look at the 328 million cumulatively encountered A / AAAA records, we find nearly 50% mapping into just 5 AS:
As before, here we can also again identify distinct AS owned by the same company (e.g., AS396982 and AS15169) and add them up, which then gives us this picture: ![]() Key FindingsAlright, that's a whole lot of numbers and pie charts. Let's try to summarize and see what this data is telling us. For starters - and this may not be surprising, but still disappointing -- we find that we're still very much in an IPv4 only world: almost three quarters of all domains are v4 only. We also saw that there's a clear demand for CNAMEs at the apex. This is something that especially Content Delivery Networks (CDNs) have to deal with, as they often rely on CNAMEs to effectively load balance traffic across their edge networks. Akamai, for example, offers a feature called Zone Apex Mapping, Cloudflare uses "CNAME flattening", and Amazon Route53 offers so-called alias records, to give some examples. In my opinion, there's an opportunity for standardization of a workable solution here. (SVCB / HTTPS DNS records, as specified in this draft are a good candidate here.) But with respect to the question of whether the internet is (effectively) centralized, we did note that:
However, it is important to note that the data used for this research is incomplete: the inability to get access to the ccTLD zones means that our data is necessarily skewed towards the US market. For example, I would expect there to be a significant weight shift if I had been able to analyze the .CN, .JP, and .DE zones, to name just a few examples. Secondly, what I have presented here is only a snapshot in time. It might be interesting to track these findings over a longer timeframe to see whether or not we are moving towards an increasingly centralized internet or not. June 22nd, 2023 Links:
|