Table of
contents

8.1.2010The year I started blogging (blogware)
9.1.2010Linux initramfs with iSCSI and bonding support for PXE booting
9.1.2010Using manually tweaked PTX assembly in your CUDA 2 program
9.1.2010OpenCL autoconf m4 macro
9.1.2010Mandelbrot with MPI
10.1.2010Using dynamic libraries for modular client threads
11.1.2010Creating an OpenGL 3 context with GLX
11.1.2010Creating a double buffered X window with the DBE X extension
12.1.2010A simple random file read benchmark
14.12.2011Change local passwords via RoundCube safer
5.1.2012Multi-GPU CUDA stress test
6.1.2012CUDA (Driver API) + nvcc autoconf macro
29.5.2012CUDA (or OpenGL) video capture in Linux
31.7.2012GPGPU abstraction framework (CUDA/OpenCL + OpenGL)
7.8.2012OpenGL (4.3) compute shader example
10.12.2012GPGPU face-off: K20 vs 7970 vs GTX680 vs M2050 vs GTX580
4.8.2013DAViCal with Windows Phone 8 GDR2
5.5.2015Sample pattern generator



9.1.2010

Linux initramfs with iSCSI and bonding support for PXE booting

Initrd (or nowadays initramfs) is a simple thing, and I often find it easier to create one myself rather than to figure out how to operate distribution specific tools to get all the required features in.

Since I have a proper file server, I prefer my desktops diskless. NFS roots are otherwise OK, but efficient caching is a bit problematic since the standard Linux disk cache acts on the block device layer. I have found it easiest to just use iSCSI (which is a real block device) for the remote root filesystem. Unfortunately the Linux kernel can't be told to use its iSCSI initiator automatically during boot-up (as can be done with NFS), so we need to do that ourselves.

I also have 2 gigabit NICs in my desktop which I would like to bond in software. As this is only possible after Linux has already been booted, we need to use a single NIC during PXE booting, and then switch to the bond device before mounting the real root. Hence the right place to intialize the bond interface is the initramfs, which doesn't have any dependencies on network.

In this blog entry, I'll give you an example of how to make a transparent minimalistic initramfs that can be used to boot a Linux from network (PXE) using aggregated NICs and iSCSI.

PXE booting

I'm assuming you're familiar with PXE booting. Use this as the PXE config:
default linux
prompt 1
timeout 20

LABEL linux
    KERNEL linux
    APPEND initrd=linux-initrd
Here linux is your kernel file, and linux-initrd is the ramdisk image I will come to in just a minute. In the kernel, you'll need to enable iSCSI support as modules, and bonding support as built-in.

Initramfs

The initial ram filesystem should have an init script, which sets up the networking and the iSCSI device, mounts root, and then passes control over to the real root's init script as if nothing had happened.

Such a script could look like this (I haven't commented it much but it's rather self-explanatory):

#!/bin/busybox sh

IP=10.0.0.10
NETMASK=255.255.255.0
GATEWAY=10.0.0.1
NETIF=bond0
SLAVENICS="eth0 eth1"

ISCSIHOST="10.0.0.2:3260"
ISCSITARGET="iqn.1986-03.com.sun:02:81d8ef12-d317-64f9-8546-e827ff873037"

MODULES="drivers/scsi/scsi_transport_iscsi.ko drivers/scsi/libiscsi.ko drivers/scsi/libiscsi_tcp.ko drivers/scsi/iscsi_tcp.ko"

rootshell() {
    echo failure, invoking shell
    exec /bin/busybox sh
}

pseudofss() {
    echo mounting proc
    /bin/mount -t proc none /proc || return 1
    echo mounting sysfs
    /bin/mount -t sysfs none /sys || return 1
}

# might not work
udevinit() {
    echo setting up udev clone
    echo /sbin/mdev > /proc/sys/kernel/hotplug
    /bin/mdev -s || return 1
}

ifup() {
    echo bringing net up
    /bin/ifconfig $NETIF $IP netmask $NETMASK up
    /bin/ifenslave $NETIF $SLAVENICS
    /bin/route add default gw $GATEWAY
}

loadmods() {
    echo loading modules:  $MODULES
    for mod in $MODULES
        do /bin/insmod /lib/modules/`/bin/uname -r`/kernel/$mod
    done
}

mountroot() {
    echo initializing iscsi
    echo   spawning iscsid
    /bin/iscsid || return 1
    echo   logging to "$ISCSITARGET" on "$ICSCIHOST"
    /bin/iscsiadm -m node --targetname $ISCSITARGET --portal $ISCSIHOST --login || return 2
    echo waiting 5 secs before trying to mount root
    /bin/busybox sleep 5
    /bin/mount -o ro LABEL=ISCSI_ROOT /newroot || return 3
}

clean() {
    echo umounting proc and sysfs
    /bin/umount /proc || return 1
    /bin/umount /sys || return 2
}  

pseudofss || rootshell
loadmods || rootshell
ifup || rootshell
mountroot || rootshell
clean # if this fails, we're still trying to boot
exec switch_root /newroot /sbin/init

For this to work, you need to include in your image:

My root, for example, looks like this:

.:
bin/  createpkg.sh*  dev/  etc/  init*  lib/  lib64@  newroot/  proc/  sys/  tmp/  var/

./bin:
busybox*   ifenslave@      insmod*    iscsid*  mdev@   route@  strace*       umount@
ifconfig*  ifenslave-2.6*  iscsiadm*  ls@      mount@  sh@     switch_root@  uname@

./dev:
console  null  sda  sda1  sda2  sdb  sdb1  sdb2  sdc  sdc1  sdc2  sdd  sdd1  sdd2  zero

./etc:
group  iscsi/  passwd  shadow

./etc/iscsi:
ifaces/  initiatorname.iscsi  iscsid.conf  nodes/  send_targets/

(A bunch of stuff in iscsi configuration)

./lib:
ld-2.9.so*             libnss_compat.so.2@   libnss_mdns.so.2           libnss_nis.so.2@
ld-linux-x86-64.so.2@  libnss_dns-2.9.so     libnss_mdns4.so.2          libnss_nisplus-2.9.so
libc-2.9.so*           libnss_dns.so.2@      libnss_mdns4_minimal.so.2  libnss_nisplus.so.2@
libc.so.6@             libnss_files-2.9.so   libnss_mdns6.so.2          modules/
libm-2.9.so            libnss_files.so.2@    libnss_mdns6_minimal.so.2
libm.so.6@             libnss_hesiod-2.9.so  libnss_mdns_minimal.so.2
libnss_compat-2.9.so   libnss_hesiod.so.2@   libnss_nis-2.9.so

./lib/modules:
2.6.30.1/

(A bunch of stuff in my kernel module directory)

./newroot:

./proc:

./sys:

./tmp:

./var:
lock/  run/

./var/lock:
iscsi/

./var/lock/iscsi:
lock

./var/run:
iscsid.pid

You can grab the initramfs I use from here (without /etc/shadow, /lib/modules/2.6.30.1, or /etc/iscsi, which are personal).

When you have a root ready, package it like this:

#!/bin/sh
find . -print0 | cpio -ov -0 --format=newc | gzip -9 > ../initramfs.cpio.gz

Now you should be ready to enjoy your 2Gbps remote root with a proper local disk cache. :-)
For instance, I have 8G of RAM in my desktop, of which ~7G is usually free for cache; the performance difference to NFS is huge.

Comments

  3.10.2010

This aritcle helps a lot! But I still confuse that is there anything we have to build in kernel directly to let iscsi work properly? 
And in my experience, I have to remove the network start script to prevent the iscsi connection from disconneting.
- Roy

  11.10.2010

Thanks!  No, I didn't need to compile anything extra in the kernel.  All the iSCSI stuff is included in the ramfs as modules.
The network start script is a valid concern, however I didn't have to touch it.  I guess the interface is down such a short time before brought back up again that it doesn't block on any file system calls. I'm using a static IP for my desktop, so it doesn't have to invoke a dhcp client or anything while bringing up the interface again. (Might have a difference)
- wili






Nick     E-mail   (optional)

Is this spam? (answer opposite of "yes" and add "pe")