Networking, VLAN tagging and IPMP on LDOM vswitches
UPDATE: Feb 2009
This page has been rendered largely irrelevant with the release of the LDOMS 1.1 software which properly support VLANs now. Aside from no longer requiring an MTU bodge, you can now choose which vlans to pass through to the LDOM which is a massive improvement. To make this work, when you configure the service domain:
ldm add-vswitch vid=10,200,991 mac-addr=0:14:4f:1:aa:aa net-dev=e1000g0 primary-vsw0 primary
And when you configure the network for the hosted LDOM, if you just want to specify interfaces with a single untagged vlan:
ldm add-vnet pvid=10 vnet0 primary-vsw0 <LDOMHOSTNAME> ldm add-vnet pvid=200 vnet1 primary-vsw0 <LDOMHOSTNAME>
Or for tagged links (to use vnet10000 and vnet200000):
ldm add-vnet vid=10,200 vnet0 primary-vsw0 <LDOMHOSTNAME>
Or why MTUs are a pain in the arse
I've spent some time configuring some Logical Domains on one of our T2000s for some development machines at work, one of which is a test environment for our main webserver. Having spent the best part of a day debugging some odd networking and NFS problems, I figured I'd write this up in case it saves anyone else some hassle.
I'd done most of the setup work and had the machine up and running, and all seemed to be working fine. I mounted the NFS shares which contained the development webserver files and user home directories at which point it all went a bit wrong. I could perform an 'ls' on the web server directory just fine. Trying that on the user home directories caused both NFS mounts to hang completely.
While checking that the relevant bits of NFS config on our Sun Cluster were ok (they were) and the network settings (also fine), I happened to run an ifconfig on the development LDOM, but forgot the '-a' to output the information for all interfaces. This caused my SSH session to hang.
Normally on Solaris, running ifconfig without -a displays the usage instructions. A quick test on a different machine revealed that this usage information is 1365 bytes long. Another quick test (running an ls in a directory on the local machine) also caused my connection to hang. Aha! This smells like an MTU problem.
Because we need to present multiple networks to these machines and use IP Multipathing (IPMP), we're using the built-in Solaris support for 802.1Q VLAN tagging.
On regular Solaris, this involves plumbing a virtual device with the vlan number and interface ID encoded:
vlan 10, device e1000g1 = e1000g10001 vlan 999, device bge0 = bge999000
On LDOMS, this involves creating a vswitch in the service domain that's attached to a physical interface as you would normally. You then have to create a vlan-style virtual interface in solaris like you would normally, but within each LDOM - you can't do this at the vswitch level yet.
On the host machine, e1000g0 and e1000g1 are identically configured tagged switch links with a number of VLANS fed down them. They have both been configured as the physical devices for a vswitch in the service domain, which then provide the networking to the guest LDOMS.
Example config for the service domain:
ldm add-vswitch mac-addr=0:14:4f:1:aa:aa net-dev=e1000g0 primary-vsw0 primary ldm add-vswitch mac-addr=0:14:4f:1:aa:ab net-dev=e1000g1 primary-vsw1 primary
It's important to specify the mac-address of the interface you're 'replacing', or the LDOMS won't talk to the outside world properly.
On the service domain, we then plumb some 'vsw' interfaces instead of the regular e1000g devices to provide its connection to the rest of the network, for example:
vsw10000 - VLAN 10, interface vsw0 (so e1000g0) vsw10001 - VLAN 10, interface vsw1 (so e1000g1)
You can then use these as regular interfaces.
On this particular guest LDOM, we have the following config:
ldm add-vnet vnet0 primary-vsw0 webdevldom ldm add-vnet vnet1 primary-vsw1 webdevldom
And the following devices are plumbed:
vnet200000 (connected to vlan 200, vswitch 0) vnet200001 (vlan 200, vswitch 1) vnet991000 (vlan 991, vswitch 0) vnet991001 (vlan 991, vswitch 1)
Vlan 991 is the private network for NFS to the backend cluster, and 200 happens to be the vlan for this machines public facing services.
Solving the problem
VLAN tagging adds 4 bytes to the length of an ethernet frame - from a maximum size of 1518 bytes to 1522 (that's 1500 bytes of data, plus ethernet header information). What seems to be happening with using vlan tagged devices on LDOMS is the vswitch (or perhaps vnet driver) drops ethernet frames over 1518 bytes - a reasonable thing to do for a switch that doesn't support tagging, but unreasonable given that it otherwise passes the data on without interference.
Reducing the MTU of the LDOM by 4 bytes to 1496 immediately and completely cured the problem:
ifconfig vnet991000 mtu 1496
This has to be done for every interface on an LDOM for which you're using VLAN tagging or you'll mysteriously get some large packets simply disappearing. This was only happening for one of my NFS mounts because it happened to contain a lot of entries in its root directory, so sent at least one 1500 byte packet which never arrived - the other only had a couple of subdirectories, so the return data was under the maximum packet size.
To enable IPMP on an LDOM across two VLAN tagged interfaces, you need to do the following:
Create entries in the /etc/hosts file for the host and two test addresses:
192.168.1.42 webdevldom-priv 192.168.1.43 webdevldom-priv-test0 192.168.1.44 webdevldom-priv-test1
webdevldom-priv mtu 1496 netmask + broadcast + group webdevldom-priv-ipmp0 up addif webdevldom-priv-test0 mtu 1496 netmask + broadcast + deprecated -failover up
and in /etc/hostname.vnet991001
webdevldom-priv-test1 mtu 1496 netmask + broadcast + group webdevldom-priv-ipmp0 deprecated -failover standby up
Remember to set the MTU for each and every interface within each LDOM guest, or you'll have intermitted networking problems. Interestingly, you don't need to do this for the vsw interfaces in the service domain even though it's connected to the same vswitch as the LDOM where the problem occurs, so it appears the oversize ethernet frames are being dropped somewhere between the LDOM and the vswitch, possibly in the vnet driver - the vswitch itself seems happy to forward them on.