March 9th, 2023
This is the second blog post on the topic of the centralization of the internet. The first post, discussing diversity of authoritative name servers, can be found here.
According to various statistics, there are somewhere around 330 billion emails being sent every day, approximately 3.82 million per second. Who reads all these emails?
Ok, ok, nobody does. Who would want to? Most of it is spam anyway. But, given how personal email is, how much we rely on email for business, how useful email can be in legal discovery, and, most importantly, how -- over 40 years after RFC821 was published -- we still use a clear text protocol and have no realistic solution for end-to-end encryption of this private content... given all that, who could read that email if they wanted to? Ah, well, that's another question altogether.
The Simple Mail Transfer Protocol (SMTP) uses MX records in the DNS to identify which server(s) it should hand the mail off to. It used to be common for domain owners to run their own mail server, but it turns out that doing that well while efficiently combating spam (both incoming and outgoing), email abuse, and the ever increasing traffic volume is not that easy. And what do we do when things aren't easy? We pay somebody else to do it for us. To the cloud!
In 2023, chances are that, regardless of the domain in question, your personal and/or business email is actually handled by e.g., Google, Microsoft, Yahoo, Apple, Yandex, or, say, GMX. But even if those are your email service provider, it's also quite likely that your domain uses another layer in front of that, which provides spam-, malware-, filtering, and data-loss prevention (DLP) features. Popular service providers here include Proofpoint, Barracuda, Sophos, Trustwave, and some other offerings from big name companies as well as ones you likely have never heard of.
So let's take a look at which of these various companies are fronting the most domains and could thus, in theory, anyway, read your email!
Much like I did when I looked at the NS record diversity, I went through all the gTLD zone files (again leaving out ccTLDs), extracted all second-level domains, and then went to work with nothing but my little, trusty bind9 caching resolver running on my personal VPS.1
For each gTLD zone file, I extracted the full list of domains within that TLD, defined as any unique label in the zone file with an NS record. This yielded a grand total of approximately 203 million domain names: > 164 million in .com alone, with all other gTLDs adding up to roughly 39 million domain names. For each of those domains, I then performed DNS lookups for its MX records, and a few million queries later I ended up with a whole bunch of mail server FQDNs.
A single domain may of course have multiple MX records which may or may not be in the same domain (which itself may or may not be within the original domain):
$ dig +short mx netmeister.org # <--+ 1 MX 50 panix.netmeister.org. # <--+ within the same domain $ dig +short mx akamai.com # <--+ 4 MX 20 mx0b-00190b01.pphosted.com. # | 10 mxa-00190b01.gslb.pphosted.com. # | all in a different domain 10 mxb-00190b01.gslb.pphosted.com. # | 20 mx0a-00190b01.pphosted.com. # <--+ $ dig +short mx twitter.com # <--+ 5 MX 30 ASPMX3.GOOGLEMAIL.com. # + in two different domains 20 alt1.aspmx.l.google.com. # | (owned by the same org) 10 aspmx.l.google.com. # | 30 ASPMX2.GOOGLEMAIL.com. # + 20 alt2.aspmx.l.google.com. # <--+ $ dig +short mx whynot.coffee # <--+ 4 MX 10 mailin.mx-hub.cz. # | in four different domains 10 mailin.mx-hub.eu. # | in four different TLDs 10 mailin.mx-hub.sk. # | 10 mailin.mx-hub.net. # <--+ $
So we need to flatten the data a bit and reduce the individual MX servers to their second-level domain. With the help of some perl and the Public Suffix List, I mapped the approximately 30 million unique MX servers listed for the 203 million domains into around 21 million second-level domains.
So... who does
Stats by MX
As noted above, I found approximately 30 million unique mail servers, but of course not every domain has an MX record. In that case, SMTP assumes an "implicit MX" and attempts to deliver the mail to the IP address (if any) of the bare domain name.
As it turns out, no explicit MX record is indeed the most widely found configuration: almost 119 million domains (58% of all domains) are lacking any such resource record. Of those, 76 million (64%) do have an IP address and thus could at least theoretically receive mail; reversing those IP addresses again, we note that 28.8 million are AWS IPs (in the amazonaws.com., awsglobalaccelerator.com., and cloudfront.net. domains), 18 million Google's (1e100.net. and googleusercontent.com.; 126.96.36.199 is used by 12.8 million domains alone), and 7.3 million Wix's (wixsite.com).
That leaves around 42 million domains that do not have any means of accepting mail simply by not having either an MX record, nor an IP address. However, there are other ways that a domain owner may signal that it does not accept mail: 1.5 million (or 0.7% of all) domains have their MX set to localhost (and 425 to localhost.localdomain), which of course is a bit janky a way of telling folks not to bother you. Because this isn't quite ideal, we now have a much better way of expressing the fact that a domain does not want any mail: the "Null MX" No Service Resource Record, specified in RFC7505. That is, simply set an MX record with a preference number of 0 and a zero-length label (i.e., .):
$ host -t mx livemediastreaming.com livemediastreaming.com mail is handled by 0 . $
This approach appears to be marginally more popular than using localhost: around 2 million or just about 1% of all domains have a Null MX record set. (That approach also has the advantage that it can help in combating impersonation without having to specify an SPF policy: a receiving mail server can reject mail upon encountering an undeliverable MailFrom/From address.)
So all in all, just about 46 million domains or around 23% of all domains do not have any way of getting mail.
Number of MX Records
Now let's take a look at the ~40% (approximately 81 million) of domains with MX records. Most domains have between one and five mail exchange records, but of course there are outliers: 464 domains have more than ten MX records, 28 more than 20, and four domains have over 100! For example, the ever so aptly named everymailbox.com domain has 398 MX records, whiteinbox.net has 253, and rm02.net has 235. All of these MX records have the same priority, suggesting they are trying to aim for some DNS round-robin load balancing here.
gaodong.com is another outlier: 123 MX records with 117 distinct priorities, similar to connectingdonors.net with 59 records with unique priorities from 1 to 58.
And then there are domains that spread their MX records across multiple second-level domains, although some of them are clearly misconfigured and including what appear to be non-fqdn names as well as some that simply don't resolve at all:
$ host -t mx trustedomain.com trustedomain.com mail is handled by 10 imtat4. # these appear to be trustedomain.com mail is handled by 5 imta6. # non-fqdn names under trustedomain.com mail is handled by 5 imta21. # the trustedomain.com domain [... 40 more records like that ...] $ host -t mx dabafunk.xyz dabafunk.xyz mail is handled by 10 mail.dabafunk.xyz. dabafunk.xyz mail is handled by 0 smtp.dabafunk.xyz. dabafunk.xyz mail is handled by 2 mail.bhargo. # similarly, some are non-fqdn dabafunk.xyz mail is handled by 1 smtp.wesak. dabafunk.xyz mail is handled by 1 smtp.maitreya. dabafunk.xyz mail is handled by 1 smtp.shamballa. dabafunk.xyz mail is handled by 2 mail.wesak. # but others don't resolve dabafunk.xyz mail is handled by 1 smtp.bhargo. dabafunk.xyz mail is handled by 2 mail.maitreya. $
And my favorite: moshelasky.net, which set MX records for a number of completely unrelated and necessarily mutually exclusive big name domains, basically saying "go give my mail to Cisco, and if that doesn't work out, try Microsoft, Intel, Google, Yahoo... whatever":
$ host -t mx moshelasky.net moshelasky.net mail is handled by 70 mail.facebook.com. moshelasky.net mail is handled by 100 mail.thunderbird.com. moshelasky.net mail is handled by 100 mail.yahoo.com. moshelasky.net mail is handled by 90 mail.pirisoft.com. moshelasky.net mail is handled by 30 mail.moshelasky.com. moshelasky.net mail is handled by 40 mail.moshelasky.net. moshelasky.net mail is handled by 100 mail.walla.co.il. moshelasky.net mail is handled by 20 mail.outlook.com. moshelasky.net mail is handled by 50 mail.intel.com. moshelasky.net mail is handled by 80 mail.grc.com. moshelasky.net mail is handled by 100 mail.mailchimp.com. moshelasky.net mail is handled by 100 mail.digicert.com. moshelasky.net mail is handled by 100 mail.noip.com. moshelasky.net mail is handled by 100 mail.google.com. moshelasky.net mail is handled by 60 mail.microsoft.com. moshelasky.net mail is handled by 100 mail.windows.com. moshelasky.net mail is handled by 10 mail.cisco.com. $
Valid MX Records
But ok, let's look at the domains with reasonable MX records: In the 30 million unique servers listed, we expect to see several of the popular email and hosting providers' mail servers, but of course less popular domains will have their own MX records that are likely to be unique. In fact, almost 98% of all domains have a globally unique mail server, making only a single appearance. Of the other 380K mail servers, around 2K appear more than 1,000 times. The top 20 most frequently used mail servers here are:
You can see an obvious trend here: Google's mail servers are rather popular (although not the most popular), and of course chances are that domains that have e.g., alt1.aspmx.l.google.com. as one MX will likely have also alt2.aspmx.l.google.com. as a second record. This suggests that we can gain more insights by reducing them to their domain name:
MX Record Domains
To better understand who the operators of these mail servers are, I flattened the data such that a domain that contains MX records pointing to, say, aspmx.l.google.com., alt1.aspmx.l.google.com., and smtp.secureserver.net. would be counted once each for the domains google.com and secureserver.net..
This breaks down our data set to 21 million unique domains, and the top 20 domains in which we find most MX records are:
Obviously we can combine some of the domains by company or organization to better reflect the concentration of the mail servers. With that, we find that Google takes the lion's share of domains with about 34%, GoDaddy around 14%, Namecheap 13.5%, and Microsoft trailing behind with about 4.7%4:
To note: All of this is for all generic second-level domains but excluding country-code TLDs. Necessarily, this skews the findings a bit, as we'd expect e.g., European countries to use non-American service providers.
Spot-checking 100,000 domains each from .ch, .fr, and .se -- three of the only 17 ccTLD zone files / domain name listings I was able to access -- shows OVH and Gandi ahead of Google in .fr, Hostpoint AG and Infomaniak in the top 3 in .ch, and the Swedish One.com not surprisingly taking the top spot in .se, but a full analysis of all ccTLD zones would obviously be needed to get a complete view.
Stats for Top 1M Domains
Looking at all domains tells us which mail servers are listed most frequently, but that of course includes hundreds of thousands if not millions of parked domains, spam domains, one-time or dormant domains etc. So let's instead look at the Tranco Top 1 Million list and see if our distribution changes.
For those 1 million domains, we find around 433K distinct MX servers in 230K domains. The top 20 mail server domains there are slightly different from those for all domains:
We observe that amongst the top 1M domains, many outsource mail not just to the big providers (Google and Microsoft together account for 60% of all!), but often add another layer of email protection via different, more specialized service provides such as Proofpoint, Barracuda Networks, or Cisco / IronPort. Those may then well also hand the mail to e.g., Google or Microsoft, further increasing their share, but that remains opaque to us from the outside.
In summary, some of the information were we able to pull out of our MX data collection includes:
So all in all, the answer to the question of who can read your email pretty much boils down to -- yep -- "Google and Microsoft". Even if your domain doesn't use one of their mail servers, chances are that whoever you are sending mail to does.
To be fair: these companies are going to be doing a much better job at running and securing your email than you are, and outsourcing this critical functionality often makes good sense. And yet, this is another example of the continuously increasing centralization of the internet. Our businesses just like our personal online lives are concentrated in the hands of just a few companies.
March 9th, 2023
 Performing millions of parallel DNS lookups leads to some interesting problems in different areas, which are probably worth a separate blog post all on their own.
 In countries where "gmail" was already trademarked, Google uses the googlemail.com domain. This includes e.g., the UK, Germany, Russia, and Poland.
 h-email.net appears to be a domain used primarily or exclusively for parked domains by e.g., ParkingCrew. A peculiarity of the domain is it's SPF record (ip6:fd96:1c8a:43ad::/48 -all), which allows only traffic on an IPv6 Unique Local Address (ULA), despite mail.h-email.net having only IPv4 addresses that belong to Digital Ocean and Hetzner Online GmbH.
 The percentages here are not quite accurate, since they are over only those mail servers that are used by 1,000 or more domains. Over all 21 million mail servers, they are reduced somewhat, but the proportional dominance of the top domains remains.