8.1.2010	The year I started blogging (blogware)
9.1.2010	Linux initramfs with iSCSI and bonding support for PXE booting
9.1.2010	Using manually tweaked PTX assembly in your CUDA 2 program
9.1.2010	OpenCL autoconf m4 macro
9.1.2010	Mandelbrot with MPI
10.1.2010	Using dynamic libraries for modular client threads
11.1.2010	Creating an OpenGL 3 context with GLX
11.1.2010	Creating a double buffered X window with the DBE X extension
12.1.2010	A simple random file read benchmark
14.12.2011	Change local passwords via RoundCube safer
5.1.2012	Multi-GPU CUDA stress test
6.1.2012	CUDA (Driver API) + nvcc autoconf macro
29.5.2012	CUDA (or OpenGL) video capture in Linux
31.7.2012	GPGPU abstraction framework (CUDA/OpenCL + OpenGL)
7.8.2012	OpenGL (4.3) compute shader example
10.12.2012	GPGPU face-off: K20 vs 7970 vs GTX680 vs M2050 vs GTX580
4.8.2013	DAViCal with Windows Phone 8 GDR2
5.5.2015	Sample pattern generator

9.1.2010

Linux initramfs with iSCSI and bonding support for PXE booting

Initrd (or nowadays initramfs) is a simple thing, and I often find it easier to create one myself rather than to figure out how to operate distribution specific tools to get all the required features in.

Since I have a proper file server, I prefer my desktops diskless. NFS roots are otherwise OK, but efficient caching is a bit problematic since the standard Linux disk cache acts on the block device layer. I have found it easiest to just use iSCSI (which is a real block device) for the remote root filesystem. Unfortunately the Linux kernel can't be told to use its iSCSI initiator automatically during boot-up (as can be done with NFS), so we need to do that ourselves.

I also have 2 gigabit NICs in my desktop which I would like to bond in software. As this is only possible after Linux has already been booted, we need to use a single NIC during PXE booting, and then switch to the bond device before mounting the real root. Hence the right place to intialize the bond interface is the initramfs, which doesn't have any dependencies on network.

In this blog entry, I'll give you an example of how to make a transparent minimalistic initramfs that can be used to boot a Linux from network (PXE) using aggregated NICs and iSCSI.

PXE booting

I'm assuming you're familiar with PXE booting. Use this as the PXE config:

default linux

prompt 1

timeout 20

LABEL linux

    KERNEL linux

    APPEND initrd=linux-initrd

Here linux is your kernel file, and linux-initrd is the ramdisk image I will come to in just a minute. In the kernel, you'll need to enable iSCSI support as modules, and bonding support as built-in.

Initramfs

The initial ram filesystem should have an init script, which sets up the networking and the iSCSI device, mounts root, and then passes control over to the real root's init script as if nothing had happened.

Such a script could look like this (I haven't commented it much but it's rather self-explanatory):

#!/bin/busybox sh

IP=10.0.0.10

NETMASK=255.255.255.0

GATEWAY=10.0.0.1

NETIF=bond0

SLAVENICS="eth0 eth1"

ISCSIHOST="10.0.0.2:3260"

ISCSITARGET="iqn.1986-03.com.sun:02:81d8ef12-d317-64f9-8546-e827ff873037"

MODULES="drivers/scsi/scsi_transport_iscsi.ko drivers/scsi/libiscsi.ko drivers/scsi/libiscsi_tcp.ko drivers/scsi/iscsi_tcp.ko"

rootshell() {

    echo failure, invoking shell

    exec /bin/busybox sh

}

pseudofss() {

    echo mounting proc

    /bin/mount -t proc none /proc || return 1

    echo mounting sysfs

    /bin/mount -t sysfs none /sys || return 1

}

# might not work

udevinit() {

    echo setting up udev clone

    echo /sbin/mdev > /proc/sys/kernel/hotplug

    /bin/mdev -s || return 1

}

ifup() {

    echo bringing net up

    /bin/ifconfig $NETIF $IP netmask $NETMASK up

    /bin/ifenslave $NETIF $SLAVENICS

    /bin/route add default gw $GATEWAY

}

loadmods() {

    echo loading modules:  $MODULES

    for mod in $MODULES

        do /bin/insmod /lib/modules/`/bin/uname -r`/kernel/$mod

    done

}

mountroot() {

    echo initializing iscsi

    echo   spawning iscsid

    /bin/iscsid || return 1

    echo   logging to "$ISCSITARGET" on "$ICSCIHOST"

    /bin/iscsiadm -m node --targetname $ISCSITARGET --portal $ISCSIHOST --login || return 2

    echo waiting 5 secs before trying to mount root

    /bin/busybox sleep 5

    /bin/mount -o ro LABEL=ISCSI_ROOT /newroot || return 3

}

clean() {

    echo umounting proc and sysfs

    /bin/umount /proc || return 1

    /bin/umount /sys || return 2

}   

pseudofss || rootshell

loadmods || rootshell

ifup || rootshell

mountroot || rootshell

clean # if this fails, we're still trying to boot

exec switch_root /newroot /sbin/init

For this to work, you need to include in your image:

busybox (I had to compile with ext4 support for mount)
ifconfig, ifenslave, insmod, iscsiadm, iscsid
appropriate device nodes in /dev
kernel modules in /lib/modules
iscsi configuration files in /etc/iscsi
(I suggest you set up iSCSI first in a working system and then export the configuration from there.)
basic infra in /etc (group, passwd, shadow) and /var (for iscsid)
a bunch of dynamic libs for the aforementioned tools in /lib

My root, for example, looks like this:

.:

bin/  createpkg.sh*  dev/  etc/  init*  lib/  lib64@  newroot/  proc/  sys/  tmp/  var/

./bin:

busybox*   ifenslave@      insmod*    iscsid*  mdev@   route@  strace*       umount@

ifconfig*  ifenslave-2.6*  iscsiadm*  ls@      mount@  sh@     switch_root@  uname@

./dev:

console  null  sda  sda1  sda2  sdb  sdb1  sdb2  sdc  sdc1  sdc2  sdd  sdd1  sdd2  zero

./etc:

group  iscsi/  passwd  shadow

./etc/iscsi:

ifaces/  initiatorname.iscsi  iscsid.conf  nodes/  send_targets/

(A bunch of stuff in iscsi configuration)

./lib:

ld-2.9.so*             libnss_compat.so.2@   libnss_mdns.so.2           libnss_nis.so.2@

ld-linux-x86-64.so.2@  libnss_dns-2.9.so     libnss_mdns4.so.2          libnss_nisplus-2.9.so

libc-2.9.so*           libnss_dns.so.2@      libnss_mdns4_minimal.so.2  libnss_nisplus.so.2@

libc.so.6@             libnss_files-2.9.so   libnss_mdns6.so.2          modules/

libm-2.9.so            libnss_files.so.2@    libnss_mdns6_minimal.so.2

libm.so.6@             libnss_hesiod-2.9.so  libnss_mdns_minimal.so.2

libnss_compat-2.9.so   libnss_hesiod.so.2@   libnss_nis-2.9.so

./lib/modules:

2.6.30.1/

(A bunch of stuff in my kernel module directory)

./newroot:

./proc:

./sys:

./tmp:

./var:

lock/  run/

./var/lock:

iscsi/

./var/lock/iscsi:

lock

./var/run:

iscsid.pid

You can grab the initramfs I use from here (without /etc/shadow, /lib/modules/2.6.30.1, or /etc/iscsi, which are personal).

When you have a root ready, package it like this:

#!/bin/sh

find . -print0 | cpio -ov -0 --format=newc | gzip -9 > ../initramfs.cpio.gz

Now you should be ready to enjoy your 2Gbps remote root with a proper local disk cache. :-)
For instance, I have 8G of RAM in my desktop, of which ~7G is usually free for cache; the performance difference to NFS is huge.

Comments

3.10.2010

This aritcle helps a lot! But I still confuse that is there anything we have to build in kernel directly to let iscsi work properly? 
And in my experience, I have to remove the network start script to prevent the iscsi connection from disconneting.

- Roy

11.10.2010

Thanks!  No, I didn't need to compile anything extra in the kernel.  All the iSCSI stuff is included in the ramfs as modules.
The network start script is a valid concern, however I didn't have to touch it.  I guess the interface is down such a short time before brought back up again that it doesn't block on any file system calls. I'm using a static IP for my desktop, so it doesn't have to invoke a dhcp client or anything while bringing up the interface again. (Might have a difference)

- wili

wili
Ville Timonen

hack blog

Table of
contents

9.1.2010

Linux initramfs with iSCSI and bonding support for PXE booting

PXE booting

Initramfs

Comments

3.10.2010

11.10.2010

wili Ville Timonen

hack blog

Table ofcontents

9.1.2010

Linux initramfs with iSCSI and bonding support for PXE booting

PXE booting

Initramfs

Comments

3.10.2010

11.10.2010

wili
Ville Timonen

Table of
contents