Signs of Triviality

Opinions, mostly my own, on the importance of being and other things.
[homepage] [index] [jschauma@netmeister.org] [@jschauma] [RSS]

Solaris 10 ZFS vs. Apple XRaid

After setting up a Solaris 10 machine with ZFS as the new NFS server, I'm stumped by some serious performance problems. Here are the details:

The machine in question is a dual-amd64 box with 2GB RAM and two broadcom gigabit NICs. The Broadcom BRCMbcme package was installed to use the interfaces. The OS is Solaris 10 6/06 and the filesystem consists of a single zpool stripe across the two halfs of an Apple XRaid (each half configured as RAID5), providing a pool of 5.4 TB. On the pool, I've created a total of 60 filesystems, each of them shared via NFS, each of them with compression turned on. The clients (NetBSD) mount the filesystems with '-U -r32768 -w32768', and initially everything looks just fine. (The clients also do NIS against a different server.)

Running 'ls' and 'ls -l' on a large directory looks just fine, upon first inspection. Reading the filesystem works fine, too:

  1. Running a 'find . -type f -print' on a directory with a total of 46450 files/subdirectories in it takes about 90 seconds, yielding an average I/O size of 64 at around 2000 kr/s according to iostat(1M).
  2. Running a 'dd if=/dev/zero of=blah bs=1024k count=128' takes about 18 seconds at almost 7MB/s (this is a 10/100 network). To compare how this measures up when not doing any file I/O, I ran 'dd if=/dev/zero bs=1024k count=128 | ssh remotehost "cat - >/dev/null"', which took about 13 seconds.
  3. Reading from the NFS share ('dd if=blah of=/dev/null bs=1024k') takes about 12 seconds.

All of this is perfectly acceptable. Compared with the old NFS server (which runs on IRIX), we get:

  1. takes significantly longer on IRIX: about 300 seconds
  2. is somewhat faster on IRIX: it takes about 14 seconds
  3. takes about the same (around 12 seconds)
(The comparison is not entirely fair, however: the NFS share mounted from the IRIX machine is also exported to about 90 other clients, and does see its fair share of I/O, while the Solaris NFS share is only mounted on this one client.)

Alright, so what's my beef? Well, here's the fun part: when I try to actually use this NFS share as my home directory (as I do with the IRIX NFS mount), then somehow performance plummets. Reading my inbox (~/.mail) will take around 20 seconds (even though it has only 60 messages in it).

When I try to run 'ktrace -i mutt' with the ktrace output going to the NFS share, then everything crawls to a halt. While that command is running, even a simple 'ls -la' of a small directory takes almost 5 seconds.

Neither the ktrace nor the mutt command can be killed right away -- they're blocking on I/O.

Alright, so after it finally finished, I try something a bit simpler. 'vi plaintext'. Ok, that's snappy as should be. Now: 'vim plaintext'. Ugh, that took almost 4 seconds for the editor to come up. There are all kinds of other examples that I tried, but the one standing out the most was trying to create a number of directories:

for i in `jot 100`; do
        mkdir $i
        for j in `jot 100`; do
                mkdir $i/$j
        done
done

On the IRIX NFS share, this takes about 60 seconds.

On the Solaris NFS share, this takes... forever. (I interrupted it after 10 minutes, when it had managed to create 2500 directories.)

tcpdump and snoop show me that traffic zips by as it should for the operations described above ((1), (2) and (3)), but become very "bursty" when doing reads and writes simultaneously or when creating the directories. That is, instead of a constant stream of packets zipping by, the tcpdump give me about 15 lines every second, but I can't find any packet loss.

I've tried to see if this is a problem with ZFS itself: I ran the same tests on the file server on the ZFS, and everything seems to work just fine there.

I've tried to mount the filesystem over TCP and with different read/write sizes, with NFSv2 and NFSv3 (the clients don't support NFSv4).

I've tried to see if it's the NIC or the network by testing regular network speeds and connecting the machine to a different switch etc. all to no avail.

I've played with every setting in /etc/default/nfs to no avail, and I just can't put my finger on it.

Alright, so in my next attempt to see if I'm crazy or not, I installed Solaris 6/06 on another workstation. From there, mounting a ZFS works just dandy, all the above tests are fast.

So I reinstall the other machine. After importing the old zpool, nothing has changed. I destroy the zpool and recreate it. Still the same problem.

To ensure that it's not the SAN switch, I connect the Solaris machine directly to the XRaid, and again, no change.

I destroy the Raid-5 config on the XRaid and build a Raid-0 across 7 of the disks. Creating a zpool of only this one (striped) disk also does not change performance at all. Creating a regular UFS on this disk, however, immediately fixes the problems! So it's not the fibre channel switch, it's not the fibre channel cables, it's not the fibre channel card, it's not the gigabit card, it's not the machine, it's not the mount options, it simply appears to be ZFS. ZFS on an Apple XRaid, to be precise. (Maybe it's ZFS on fibre-channel, I don't know; it's not ZFS per se, as the other freshly installed machine with ZFS on a SATA local disk worked fine.)

ZFS on XRaid; somewhat of a bummer, since I'd waited for this release to finally make use of ZFS.

July 25, 2006


[Moving license managers] [index] [Solaris UFS >1TB vs. inodes]