Diskless Linux with PXE HOWTO
You'll find all downloads and the current version of this document at http://www.intra2net.com/de/produkte/opensource/diskless-howto/
If you have comments or additions to this HOWTO don't hesitate to write to
.
Revision: 1.1 (2004-02-05):
- Fix trim function in bootimage
- Add fsck workaround
- Add retries to dhcp client (thanks to Joe Robertson)
- Add SuSE comment
Revision: 1.0 (2003-12-21): Original release
1. Legal stuff
Copyright (C)2003-2007 by Intra2net AG
This material may be distributed only subject to the terms and
conditions set forth in the Open Publication License, v1.0 or later
(the
latest version is presently available at http://www.opencontent.org/openpub/).
The source code found on this site is free software; you can
redistribute it and/or modify it under the terms of the GNU General
Public License version 2 as published by the Free Software Foundation.
This document is provided WITHOUT ANY WARRANTY; without even the
implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
2. Pre-Requisites
- a diskless client with PXE remote boot capabilities. Most
on-board network cards have a PXE client in BIOS. The more expensive
NICs often come with their own small PXE BIOS. Sometimes you have to
activate it first - I had to use IBAUtil
to activate PXE on my Intel EEPro/100+.
- a Linux distribution CD you want to run on the diskless system
- a DHCP server (e.g. dhcpd that comes with most distributions)
- a TFTP server (e.g. tftp-server that comes with most
distributions)
- a NFS server
- DHCP, TFTP and NFS can live happily together on one server
3. Installing the distribution
A ready-to-run installed distribution usually resides on the hard disk
of the machine. Now we don't want to use a hard disk on our client, but
load the distribution installed from a server via NFS. So we need to
have it installed into a directory tree onto the server. There are two
ways to accomplish this:
- Install the distribution on a disk and then copy all files into a
directory onto your server
- Install the distribution directly into a dir on the server. I have a script to install a minimal Fedora
Core
1. Mount the first Fedora CD, create an empty directory and run it like
this: ./mkfedora.sh [TARGETDIR] [CDROMPATH]
In either case you need to prepare your distribution to be booted
remotely. You can use this script like
./mkbootready [SYSROOT] where SYSROOT is the root of your installed
distribution. If you prefer to fix it by hand do it like this (watch
out for the tabs!):
# we'll have a read-only /etc
ln -sf /proc/mounts ${SYSROOT}/etc/mtab
ln -sf /var/resolv.conf ${SYSROOT}/etc/resolv.conf
# replace the entry for / in fstab: rc.sysinit otherwise tries to do a fsck.ext2...
cp ${SYSROOT}/etc/fstab ${SYSROOT}/etc/fstab.orig
grep -v -e "[ ]/[ ]" ${SYSROOT}/etc/fstab >${SYSROOT}/etc/fstab.new
echo "none / tmpfs defaults 0 0" >>${SYSROOT}/etc/fstab.new
mv -f ${SYSROOT}/etc/fstab.new ${SYSROOT}/etc/fstab
# disable networking (we have it already set up if the rc's are running)
/usr/sbin/chroot ${SYSROOT} /sbin/chkconfig --del network
# disable kudzu (won't do anything since we are on a read-only filesystem)
/usr/sbin/chroot ${SYSROOT} /sbin/chkconfig --del kudzu
# move RPM database to /usr (saves ram since /var will live in tmpfs)
mkdir -p ${SYSROOT}/usr/var/lib/
mv ${SYSROOT}/var/lib/rpm/ ${SYSROOT}/usr/var/lib/
ln -s /usr/var/lib/rpm ${SYSROOT}/var/lib/rpm
# remove temp files from RPM database
rm -f ${SYSROOT}/usr/var/lib/rpm/__db.*
# compress /dev to speed up booting
tar czf ${SYSROOT}/dev.tar.gz -C ${SYSROOT} dev
4. Configuring NFS
The next step is to allow mounting the distribution directory via NFS.
Add a line like this to your /etc/exports on the NFS server:
/netboot/pxeclient 192.168.1.0/24(ro,no_root_squash)
/netboot/pxeclient is the distribution root directory in this example.
DO NOT allow mounting this directory in read-write mode since you need
to set no_root_squash. Allowing writes on a no_root_squash export is a
major security risk.
Make sure your NFS server is started and don't forget to reload it's
configuration after changing the exports. Take a look at the NFS HOWTO if you got
problems.
5. Prepairing the initrd image
Download my base image. If you don't have
all needed options built directly into your kernel but as modules, you
need to prepare the initrd image. You can use this
script to handle this. Copy it into the same directory as the base
image and start it like this:
./mkbootimage.sh [SYSROOT] [SYSTEMMAP] [KERNELVER]
SYSROOT is the root directory of the system you want to take the kernel
modules from. Usually it is something like /netboot/pxeclient/. There
must be a valid modules.dep in [SYSROOT]/lib/modules/[KERNELVER].
SYSTEMMAP is the System.map file that came with your kernel. Usually
something like /netboot/pxeclient/boot/System.map-[KERNELVER]
KERNELVER is the full version of your kernel. For Fedora Core 1 this
would be 2.4.22-1.2115.nptl.
If you want to do it by hand, you've got to do it like this:
# unzip and mount base image
gunzip ${IMAGE}.gz
mkdir __pxeboot-tmp__
mount -o loop $IMAGE __pxeboot-tmp__
[ $? -ne 0 ] && echo "error mounting image" && exit 1
# create module directories
mkdir -p __pxeboot-tmp__/lib/modules/${KERNELVER}/net
mkdir -p __pxeboot-tmp__/lib/modules/${KERNELVER}/nfs
Now copy nfs.o and its dependencies (usually sunrpc.o and lockd.o) into
.../nfs.
Then grab your network driver modules and their dependencies (often
mii.o) and put them into .../net. It is possible to copy different
modules for multiple clients with different hardware in there. The
loader will pick the right one.
# recalculate dependencies
/sbin/depmod -a -F $SYSTEMMAP -b __pxeboot-tmp__ -C /dev/null $KERNELVER
# unmount and zip again
umount __pxeboot-tmp__
gzip -9 $IMAGE
rmdir __pxeboot-tmp__
6. Configuring DHCP
Install and configure your DHCP server. There are other good HOWTOs
covering this so I'm not going to explain it here.
Add a section like this for your client to /etc/dhcpd.conf:
host pxeclient {
hardware ethernet AA:BB:CC:DD:EE:FF;
fixed-address 192.168.1.55;
option host-name "pxeclient";
filename "/pxelinux.0";
next-server tftpserver;
option root-path "nfsserver:/netboot/pxeclient";
}
Important for a diskless client are:
filename
|
the name of the PXE boot image
on the TFTP server (always "/pxelinux.0" here) |
next-server
|
name or IP of the TFTP server
|
root-path
|
NFS-path to the root directory
of the client. You can omit the "server:" if it is on the same machine
as the TFTP. You can add comma separated NFS options (see the mount(8)
manpage) at the end (like
"nfsserver:/netboot/pxeclient,retry=1,rsize=8192,wsize=8192" |
Don't forget to restart your dhcpd after adding these options.
7. Configuring TFTP
Install the tftp-server package. If your distribution uses xinetd you
usually just have to switch "disable=no" in /etc/xinetd.d/tftp and
restart xinetd. Or start tftpd manually with the "-s /tftpboot" option.
The downloadable files usually reside in /tftpboot.
Now go to http://www.kernel.org/pub/linux/utils/boot/syslinux/
and download the newest version of syslinux. It should contain a file
called pxelinux.0. Copy this file into your /tftpboot directory.
Cd to /tftpboot and create a subdirectory called "pxelinux.cfg". Put a
file called "default" in there:
LABEL linux
KERNEL vmlinuz-2.4.22-1.2115.nptl
APPEND initrd=pxeboot.img.gz ramdisk_size=8192
Of course you should use the name of the kernel image you are planning
to use. This example is for Fedora Core 1. Copy your kernel and initrd
image into /tftpboot too.
8. Booting
Now you can switch on your client and watch it booting.
Don't forget that you have a read-only image. That means your client
can write on the root ramdisk into /var and /tmp but nearly nowhere
else. The easiest way to change something is to do it in a chroot on
the
NFS server. Execute e.g. chroot /netboot/pxeclient to start a chrooted
shell.
9. Reducing boot times
There are some ways to speed up the boot process. The most valuable
ones are
- using a custom kernel. Throw everything out you don't need. My
experience is that especially IDE is taking a lot of time to find out
that I don't have any hard disk connected. You'll need to have support
for your network card, initrd images, NFS, ext2fs and tmpfs.
- Tweak your /etc/rc.d/rc.sysinit. Usually it is doing stuff like
fsck and depmod that take time but don't help anything since your are
on
a read-only image. Remove everything you don't need.
10. Known problems
- Shutdown doesn't work: this is due to the client trying to
unmount it's nfs root. I haven't found a distribution-independent way
to
fix this. If you want to try it: you've got to make sure that the root
nfs isn't unmounted and the portmapper (if running) isn't killed.
Please send me your solutions and I'll add them.
- No DHCP client daemon running: The system isn't running a DHCP
daemon to renew its lease. This isn't a problem if you use fixed
addresses with DHCP like in the example above. But if you are using a
DHCP pool your client might get thrown out and the IP reassigned if the
lease has expired. I didn't want to keep the daemon from the initrd
alive since that would mean that we can't unmount initrd and will waste
the 8 MB RAM. Starting a new dhclient from the real distribution is
possible, but you usually have to tweak dhclient-script to make sure it
isn't switching off the network device or changing the IP during
discovery.
11. Differences for SuSE Linux
- SuSE kernels usually have nfs compiled in so you don't need to
add it as module. But the af_packet module is needed for dhcp. You'll
need to change mkbootimage (or do it by hand) and linuxrc (in
pxeboot.img.gz).
Appendix A. The initrd image
The initrd image contains a busybox
multicall binary. Busybox is built against a regular glibc 2.3.2
(taken from FC1), important parts from glibc are in the image too. find
is contained because the busybox-find does not support -exec. The
network card autodetection is done by lspci using the "-k" option that
spits out the corresponding module name. lspci -k is a patch by Diego Torres Milano.
This is the linuxrc contained in the initrd image:
#!/bin/sh
# (C) Copyright 2003,2004 by Gerd v. Egidy
#
# PCI autodetect Copyright 2001-2003 by Diego Torres Milano
# PCI autodetect adapted from PXES by Gerd v. Egidy
#
# This program is free software; you can redistribute it and/or modify
# it under the terms of the GNU General Public License version 2 as
# published by the Free Software Foundation.
#
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU General Public License
# along with this program; if not, write to the Free Software
# Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.
#
autodetect_pci() # PCICLASS MODCLASS
{
local PCICLASS NOTFOUND line
PCICLASS=$1
NOTFOUND=1
if [ -z "$PCICLASS" ]; then
echo "$PROGNAME: autodetect_pci missing parameter" >&2
exit 1
fi
if [ ! -r /proc/bus/pci ]; then
return $NOTFOUND
fi
# I can't make a pipe of this because ash treats it as a subshell,
# that's why the temp file
/sbin/lspci -i /dev/null -knm | /bin/grep "$PCICLASS" > /tmp/pci.$$
while read line
do
set -- $line
eval mod=\$$#
eval mod=$mod
if [ "$mod" != "UNKNOWN" ]; then
echo "loading $mod..."
/sbin/modprobe $mod
NOTFOUND=0
fi
done < /tmp/pci.$$
/bin/rm -f /tmp/pci.$$
return $NOTFOUND
}
echo "Remount / read-write"
mount -n -o remount,rw /
echo "Mounting /proc filesystem"
mount -n -t proc /proc /proc
echo "Autodetecting network devices..."
autodetect_pci 'Class 02..'
if [ $? -ne 0 ]; then
echo "WARNING: no network interface found"
fi
echo "Loading nfs modules"
modprobe nfs
echo "Initializing network loopback device"
ifconfig lo 127.0.0.1 up
echo "Configuring eth0"
udhcpc --now --quit --interface=eth0 --script=/bin/udhcpc.script
if [ $? -ne 0 ]; then
echo "ERROR: couldn't get DHCP lease, trying again"
udhcpc --now --quit --interface=eth0 --script=/bin/udhcpc.script
if [ $? -ne 0 ]; then
echo "ERROR: couldn't get DHCP lease, trying again"
udhcpc --now --quit --interface=eth0 --script=/bin/udhcpc.script
if [ $? -ne 0 ]; then
echo "ERROR: can't get DHCP lease"
exit 0
fi
fi
fi
# load DHCP parameter
. /etc/udhcpc-eth0.info
echo "Mounting nfs root filesystem"
if ! echo $ROOTPATH | grep -q ":/" ; then
# we haven't got a full path, use next-server
ROOTPATH="${NEXTSERVER}:${ROOTPATH}"
fi
if echo $ROOTPATH | grep -q "," ; then
# we have options
NFSOPTIONS=`echo $ROOTPATH | sed -e "s/\(.*\)\(,.*\)/\2/"`
ROOTPATH=`echo $ROOTPATH | sed -e "s/\(.*\)\(,.*\)/\1/"`
fi
echo "Mounting root filesystem"
mount -rw -t tmpfs none /sysroot/
echo "Mounting NFS root-base"
mkdir /sysroot/nfsroot
mount -n -o "ro,nolock${NFSOPTIONS}" -t nfs "$ROOTPATH" /sysroot/nfsroot
if [ $? -ne 0 ]; then
echo "ERROR: can't mount root filesystem via NFS"
exit 0
fi
echo "Setting root symlinks"
cd /sysroot
find ./nfsroot -maxdepth 1 -mindepth 1 -exec ln -s \{\} \;
cd /
echo "Handling special directories"
rm -f /sysroot/initrd
mkdir /sysroot/initrd
rm -f /sysroot/tmp
mkdir /sysroot/tmp
rm -f /sysroot/proc
mkdir /sysroot/proc
echo "Copying /var"
rm -f /sysroot/var
mkdir /sysroot/var
cp -a /sysroot/nfsroot/var /sysroot
cp /etc/resolv.conf /sysroot/var/resolv.conf
# /dev handling
rm -f /sysroot/dev
mkdir /sysroot/dev
if [ -f /sysroot/nfsroot/dev.tar.gz ]; then
echo "Unpacking /dev"
tar xzf /sysroot/nfsroot/dev.tar.gz -C /sysroot
else
echo "Copying /dev"
cp -a /sysroot/nfsroot/dev /sysroot
fi
echo 0x0100 > /proc/sys/kernel/real-root-dev
echo "Unmounting temporary mounts"
umount /proc
echo "Changing to new NFS root"
pivot_root /sysroot /sysroot/initrd