Remote Conversion to Linux Software RAID-1 for Crazy Sysadmins HOWTO
DRAFT
Vesion 0.2
Copyright 2004 Warren Togami <warren@togami.com>
Hits:23415
Table of Contents
- Introduction and Disclaimer
- Revision History
- Acknowledgements
- Procedure
- Appendix
1.
Introduction
XXX: Write me
This is a guide for converting Linux servers installed on a single disk
to software RAID-1 mirror with a second identical hard disk entirely
remotely with only ssh access. This procedure has been tested
successfully on Red Hat Linux 9, Fedora Core 1, and RHEL4. RHEL3 was also
tested, but as noted within the procedure one key method of starting
RAID arrays failed and I could never figure out why.
Anaconda will never be modified in such a way in order to install a degraded RAID-1
array, so this is a horrible solution for people that I want that too.
This procedure was figured out because I wanted to convert my Serverbeach dual hard drive server into RAID-1, but they don't offer that configuration. They offer only a plain install on /dev/hda with an identical blank /dev/hdc.
DISCLAIMER:
This is normally a very DANGEROUS procedure, and you can very easily
leave your system in an unbootable dead state. The below is only
an example of converting one system with a certain partition
scheme. Your situation may be different. Be sure that you
fully understand all of the tools involved and practice on a local
machine before even attempting conversion of a remote machine. By
reading this document, you accept that there is NO WARRANTY OR
LIABILITY.
Do not e-mail Warren asking for help, especially if you screwed up your
server. Do e-mail Warren if you have suggestions for improving
the procedure, or if you figured out why mdadm without mdadm.conf
failed on RHEL3.
2. Revision History
Version 0.2 - May 27, 2005
* Add some RHEL4 notes
Version 0.1 - February 21, 2004
* Initial Draft
3.
Acknowledgements
- Seth Vidal, Sys Admin at Duke University Physics department
- Vince Hoang, Chairman of the Hawaii Open Source Education
Foundation
- http://lists.us.dell.com/pipermail/linux-poweredge/2003-July/014331.html
- mdadm man page
4.
Procedure
Step 1) Edit /etc/fstab and
grub.conf, Remove labels and
use partition devices instead.
[root@master
root]# mount
/dev/hda3 on / type ext3 (rw)
none on /proc type proc (rw)
none on /dev/pts type devpts (rw,gid=5,mode=620)
/dev/hda1 on /boot type ext3 (rw)
none on /dev/shm type tmpfs (rw)
[root@master root]# cat /etc/fstab
LABEL=/
/
ext3
defaults 1 1
LABEL=/boot
/boot
ext3
defaults 1 2
none
/dev/pts
devpts gid=5,mode=620 0 0
none
/proc
proc
defaults 0 0
none
/dev/shm
tmpfs defaults 0 0
/dev/hda2
swap
swap
defaults 0 0
[root@master root]# cat
/boot/grub/grub.conf
# grub.conf generated by anaconda
#
# Note that you do not have to
rerun grub after making changes to this file
# NOTICE: You have a /boot
partition. This means that
#
all kernel and initrd paths are relative to /boot/, eg.
#
root (hd0,0)
#
kernel /vmlinuz-version ro root=/dev/hda3
#
initrd /initrd-version.img
#boot=/dev/hda
default=0
timeout=5
splashimage=(hd0,0)/grub/splash.xpm.gz
title Fedora Core
(2.4.22-1.2166.nptl)
root (hd0,0)
kernel /vmlinuz-2.4.22-1.2166.nptl ro root=LABEL=/
initrd /initrd-2.4.22-1.2166.nptl.img
### After editing...
[root@master
root]# cat /etc/fstab
/dev/hda3
/
ext3
defaults 1 1
/dev/hda1
/boot
ext3
defaults 1 2
none
/dev/pts
devpts gid=5,mode=620 0 0
none
/proc
proc
defaults 0 0
none
/dev/shm
tmpfs defaults 0 0
/dev/hda2
swap
swap
defaults 0 0
[root@master root]# cat
/boot/grub/grub.conf
# grub.conf generated by anaconda
#
# Note that you do not have to
rerun grub after making changes to this file
# NOTICE: You have a /boot
partition. This means that
#
all kernel and initrd paths are relative to /boot/, eg.
#
root (hd0,0)
#
kernel /vmlinuz-version ro root=/dev/hda3
#
initrd /initrd-version.img
#boot=/dev/hda
default=0
timeout=5
splashimage=(hd0,0)/grub/splash.xpm.gz
title Fedora Core
(2.4.22-1.2166.nptl)
root (hd0,0)
kernel /vmlinuz-2.4.22-1.2166.nptl ro root=/dev/hda3
initrd /initrd-2.4.22-1.2166.nptl.img
Step 2) Remove all labels because it
causes mount to fail
[root@master
root]# e2label /dev/hda1
/boot
[root@master root]# e2label
/dev/hda2
e2label: Bad magic number in
super-block while trying to open /dev/hda2
Couldn't find valid filesystem
superblock.
[root@master root]# e2label
/dev/hda3
/
[root@master root]# e2label
/dev/hda1 ""
[root@master root]# e2label
/dev/hda3 ""
[root@master root]# mount
/dev/hda3 on / type ext3 (rw)
none on /proc type proc (rw)
none on /dev/pts type devpts
(rw,gid=5,mode=620)
/dev/hda1 on /boot type ext3 (rw)
none on /dev/shm type tmpfs (rw)
Step 3) Confirm that both disks are
the same capacity.
[root@master
root]# cat /proc/ide/ide0/hda/capacity
160836480
[root@master root]# cat /proc/ide/ide1/hdc/capacity
160836480
Step 4) Change all partitions to fd
"Linux raid auto"
[root@master
root]# fdisk /dev/hda
The number of cylinders for this disk is set to 10011.
There is nothing wrong with that, but this is larger than 1024,
and could in certain setups cause problems with:
1) software that runs at boot time (e.g., old versions of LILO)
2) booting and partitioning software from other OSs
(e.g., DOS FDISK, OS/2 FDISK)
Command (m for help): p
Disk /dev/hda: 82.3 GB, 82348277760 bytes
255 heads, 63 sectors/track, 10011 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Device Boot
Start End
Blocks Id System
/dev/hda1 *
1 13
104391 83 Linux
/dev/hda2
14 144 1052257+
82 Linux swap
/dev/hda3
145 10011 79256677+ 83 Linux
Command (m for help): t
Partition number (1-4): 1
Hex code (type L to list codes): l
0
Empty
1c Hidden Win95 FA 70 DiskSecure Mult bb Boot Wizard
hid
1
FAT12
1e Hidden Win95 FA 75
PC/IX
be Solaris boot
2 XENIX root 24 NEC
DOS 80 Old
Minix c1 DRDOS/sec (FAT-
3 XENIX usr 39
Plan 9 81
Minix / old Lin c4 DRDOS/sec (FAT-
4 FAT16 <32M 3c
PartitionMagic 82 Linux swap
c6 DRDOS/sec (FAT-
5 Extended
40 Venix 80286 83
Linux
c7 Syrinx
6
FAT16
41 PPC PReP Boot 84 OS/2 hidden C:
da Non-FS data
7 HPFS/NTFS 42
SFS
85 Linux extended db CP/M / CTOS / .
8
AIX
4d QNX4.x
86 NTFS volume set de Dell Utility
9 AIX bootable 4e QNX4.x 2nd part
87 NTFS volume set df BootIt
a OS/2 Boot Manag 4f QNX4.x 3rd part 8e Linux
LVM e1 DOS access
b Win95 FAT32 50 OnTrack
DM 93
Amoeba e3
DOS R/O
c Win95 FAT32 (LB 51 OnTrack DM6 Aux 94 Amoeba
BBT e4 SpeedStor
e Win95 FAT16 (LB 52
CP/M
9f BSD/OS
eb BeOS fs
f Win95 Ext'd (LB 53 OnTrack DM6 Aux a0 IBM
Thinkpad hi ee EFI GPT
10
OPUS
54 OnTrackDM6 a5
FreeBSD ef EFI
(FAT-12/16/
11 Hidden FAT12 55
EZ-Drive a6
OpenBSD f0
Linux/PA-RISC b
12 Compaq diagnost 56 Golden
Bow a7
NeXTSTEP f1 SpeedStor
14 Hidden FAT16 <3 5c Priam
Edisk a8 Darwin
UFS f4 SpeedStor
16 Hidden FAT16 61
SpeedStor a9
NetBSD f2
DOS secondary
17 Hidden HPFS/NTF 63 GNU HURD or Sys ab Darwin
boot fd Linux raid auto
18 AST SmartSleep 64 Novell Netware b7
BSDI fs fe LANstep
1b Hidden Win95 FA 65 Novell Netware b8 BSDI
swap ff BBT
Hex code (type L to list codes): fd
Changed system type of partition 1 to fd (Linux raid autodetect)
Command (m for help): t
Partition number (1-4): 2
Hex code (type L to list codes): fd
Changed system type of partition 2 to fd (Linux raid autodetect)
Command (m for help): t
Partition number (1-4): 3
Hex code (type L to list codes): fd
Changed system type of partition 3 to fd (Linux raid autodetect)
Command (m for help): w
The partition table has been altered!
Calling ioctl() to re-read partition table.
WARNING: Re-reading the partition table failed with error 16: Device or
resource busy.
The kernel still uses the old table.
The new table will be used at the next reboot.
Syncing disks.
Step 5) Copy partition table to
identical disk
[root@master
root]# sfdisk -d /dev/hda > partitions.txt
[root@master root]# cat partitions.txt
# partition table of /dev/hda
unit: sectors
/dev/hda1 : start= 63,
size= 208782, Id=fd, bootable
/dev/hda2 : start= 208845, size= 2104515, Id=fd
/dev/hda3 : start= 2313360, size=158513355, Id=fd
/dev/hda4 : start= 0,
size= 0, Id= 0
[root@master root]# sfdisk /dev/hdc < partitions.txt
Checking that no-one is using this disk right now ...
OK
Disk /dev/hdc: 10011 cylinders, 255 heads, 63 sectors/track
Old situation:
Units = cylinders of 8225280 bytes, blocks of 1024 bytes, counting from
0
Device Boot Start End
#cyls #blocks Id System
/dev/hdc1
0
-
0
0 0 Empty
/dev/hdc2
0
-
0
0 0 Empty
/dev/hdc3
0
-
0
0 0 Empty
/dev/hdc4
0
-
0
0 0 Empty
New situation:
Units = sectors of 512 bytes, counting from 0
Device Boot
Start End
#sectors Id System
/dev/hdc1 *
63 208844 208782
fd Linux raid autodetect
/dev/hdc2 208845
2313359 2104515 fd Linux raid autodetect
/dev/hdc3 2313360 160826714
158513355 fd Linux raid autodetect
/dev/hdc4
0
- 0
0 Empty
Successfully wrote the new partition table
Re-reading the partition table ...
If you created or changed a DOS partition, /dev/foo7, say, then use
dd(1)
to zero the first 512 bytes: dd if=/dev/zero of=/dev/foo7 bs=512
count=1
(See fdisk(8).)
Apply this patch to
/etc/rc.d/rc.sysinit so your system will use mdadm rather than
raidtools during bootup for starting the RAID arrays.
UPDATE! This is no longer needed on RHEL4 because it uses mdadm by default.
---
rc.sysinit.orig 2004-02-04 01:42:10.000000000
-0600
+++ rc.sysinit
2004-02-04 02:26:45.000000000 -0600
@@ -435,6 +435,10 @@
/etc/rc.modules
fi
+if [ -f /etc/mdadm.conf ];
then
+
/sbin/mdadm -A -s
+fi
+
update_boot_stage
RCraid
if [ -f /etc/raidtab
]; then
#
Add raid devices
@@ -467,6 +471,10 @@
RESULT=0
RAIDDEV="$RAIDDEV(skipped)"
fi
+
if [ $RESULT -gt 0 -a -x /sbin/mdadm ]; then
+
/sbin/mdadm -Ac partitions $i -m dev
+
RESULT=$?
+
fi
if [ $RESULT -gt 0 -a -x /sbin/raidstart ]; then
/sbin/raidstart $i
RESULT=$? |
Step 6) Reboot!
Step 7) Initialize RAID devices on
unused disk
[root@master
root]# mdadm --create /dev/md0 --level=1 --raid-devices=2 missing /dev/hdc3
mdadm: array /dev/md0 started.
[root@master root]# mdadm --create /dev/md2 --level=1 --raid-devices=2
missing /dev/hdc2
mdadm: array /dev/md2 started.
[root@master root]# mdadm --create /dev/md1 --level=1 --raid-devices=2
missing /dev/hdc1
mdadm: array /dev/md1 started.
Within dmesg kernel messages you will notice messages like this:
md:
bind<hdc3,1>
md: hdc3's event counter: 00000000
md: md0: raid array is not clean
-- starting background reconstruction
md: raid1 personality registered
as nr 3
md0: max total readahead window
set to 124k
md0: 1 data-disks, max readahead
per data-disk: 124k
raid1: device hdc3 operational as
mirror 0
raid1: md0, not all disks are
operational -- trying to recover array
raid1: raid set md0 active with 1
out of 2 mirrors
md: updating md0 RAID superblock
on device
md: hdc3 [events:
00000001]<6>(write) hdc3's sb offset: 79256576
md: recovery thread got woken up
...
md0: no spare disk to reconstruct
array! -- continuing in degraded mode
md: recovery thread finished ...
md: bind<hdc2,1>
md: hdc2's event counter: 00000000
md: md2: raid array is not clean
-- starting background reconstruction
md2: max total readahead window
set to 124k
md2: 1 data-disks, max readahead
per data-disk: 124k
raid1: device hdc2 operational as
mirror 0
raid1: md2, not all disks are
operational -- trying to recover array
raid1: raid set md2 active with 1
out of 2 mirrors
md: updating md2 RAID superblock
on device
md: hdc2 [events:
00000001]<6>(write) hdc2's sb offset: 1052160
md: recovery thread got woken up
...
md2: no spare disk to reconstruct
array! -- continuing in degraded mode
md0: no spare disk to reconstruct
array! -- continuing in degraded mode
md: recovery thread finished ...
md: bind<hdc1,1>
md: hdc1's event counter: 00000000
md: md1: raid array is not clean
-- starting background reconstruction
md1: max total readahead window
set to 124k
md1: 1 data-disks, max readahead
per data-disk: 124k
raid1: device hdc1 operational as
mirror 0
raid1: md1, not all disks are
operational -- trying to recover array
raid1: raid set md1 active with 1
out of 2 mirrors
md: updating md1 RAID superblock
on device
md: hdc1 [events:
00000001]<6>(write) hdc1's sb offset: 104320
md: recovery thread got woken up
...
md1: no spare disk to reconstruct
array! -- continuing in degraded mode
md2: no spare disk to reconstruct
array! -- continuing in degraded mode
md0: no spare disk to reconstruct
Step 8) Initialize swap
[root@master
root]# mkswap /dev/md2
Setting up swapspace version 1, size = 1077407 kB
Step 9) Format filesystems
[root@master
root]# mke2fs -j /dev/md0
mke2fs 1.34 (25-Jul-2003)
Filesystem label=
OS type: Linux
Block size=4096 (log=2)
Fragment size=4096 (log=2)
9912320 inodes, 19814144 blocks
990707 blocks (5.00%) reserved
for the super user
First data block=0
605 block groups
32768 blocks per group, 32768
fragments per group
16384 inodes per group
Superblock backups stored on
blocks:
32768, 98304, 163840, 229376, 294912, 819200, 884736, 1605632, 2654208,
4096000, 7962624, 11239424
Writing inode tables: 605/605
done
Creating journal (8192 blocks):
done
Writing superblocks and
filesystem accounting information: done
This filesystem will be
automatically checked every 30 mounts or
180 days, whichever comes
first. Use tune2fs -c or -i to override.
[root@master root]# mke2fs -j
/dev/md1
mke2fs 1.34 (25-Jul-2003)
Filesystem label=
OS type: Linux
Block size=1024 (log=0)
Fragment size=1024 (log=0)
26104 inodes, 104320 blocks
5216 blocks (5.00%) reserved for
the super user
First data block=1
13 block groups
8192 blocks per group, 8192
fragments per group
2008 inodes per group
Superblock backups stored on
blocks:
8193, 24577, 40961, 57345, 73729
Writing inode tables: done
Creating journal (4096 blocks):
done
Writing superblocks and
filesystem accounting information: done
This filesystem will be
automatically checked every 34 mounts or
180 days, whichever comes
first. Use tune2fs -c or -i to override.
Step 10) Test your RAID devices, stop
and start
[root@master
root]# mdadm --stop --scan
md: marking sb clean...
md: updating md1 RAID superblock
on device
md: hdc1 [events:
00000002]<6>(write) hdc1's sb offset: 104320
md: md1 stopped.
md: unbind<hdc1,0>
md: export_rdev(hdc1)
md: marking sb clean...
md: updating md2 RAID superblock
on device
md: hdc2 [events:
00000002]<6>(write) hdc2's sb offset: 1052160
md: md2 stopped.
md: unbind<hdc2,0>
md: export_rdev(hdc2)
md: marking sb clean...
md: updating md0 RAID superblock
on device
md: hdc3 [events:
00000002]<6>(write) hdc3's sb offset: 79256576
md: md0 stopped.
md: unbind<hdc3,0>
md: export_rdev(hdc3)
Manual Start without auto-scan
[root@master
root]# mdadm -Ac partitions /dev/md0 -m dev
mdadm: /dev/md0 has been started with 1 drive (out of 2).
[root@master root]# mdadm -Ac partitions /dev/md1 -m dev
mdadm: /dev/md1 has been started with 1 drive (out of 2).
[root@master root]# mdadm -Ac partitions /dev/md2 -m dev
mdadm: /dev/md2 has been started with 1 drive (out of 2).
(NOTE: For some reason I do not understand, this fails on RHEL3.
It worked on RH9 and FC1 for me. Due to this failure I recommend
writing the optional /etc/mdadm.conf in the next step and relying
on the config file for booting, unless someone can suggest a
fool-proof way of doing this.)
Step 11) Write the /etc/mdadm.conf
For now include only RAID partitions.
UPDATE! This is no longer needed on RHEL4 because it uses mdadm by default.
DEVICE /dev/hdc1 /dev/hdc2
/dev/hdc3
ARRAY /dev/md0 devices=missing,/dev/hda3
ARRAY /dev/md1 devices=missing,/dev/hda1
ARRAY /dev/md2 devices=missing,/dev/hda2 |
Step 12) Rebuild initrd image to
include raid1
[root@master
root]# pwd
/root
[root@master root]# rpm -q kernel
kernel-2.4.22-1.2166.nptl
[root@master root]# mkinitrd -v
--preload=raid1 /root/initrd-2.4.22-1.2166.nptl.img 2.4.22-1.2166.nptl
Looking for deps of module raid1
Looking for deps of module
ide-disk
Looking for deps of module raid1
Looking for deps of module ext3
jbd
Looking for deps of module jbd
Using modules:
./kernel/drivers/md/raid1.o ./kernel/fs/jbd/jbd.o
./kernel/fs/ext3/ext3.o
Using loopback device /dev/loop0
/sbin/nash ->
/tmp/initrd.Tf1416/bin/nash
/sbin/insmod.static ->
/tmp/initrd.Tf1416/bin/insmod
`/lib/modules/2.4.22-1.2166.nptl/./kernel/drivers/md/raid1.o'
-> `/tmp/initrd.Tf1416/lib/raid1.o'
`/lib/modules/2.4.22-1.2166.nptl/./kernel/fs/jbd/jbd.o'
-> `/tmp/initrd.Tf1416/lib/jbd.o'
`/lib/modules/2.4.22-1.2166.nptl/./kernel/fs/ext3/ext3.o'
-> `/tmp/initrd.Tf1416/lib/ext3.o'
Loading module raid1
Loading module jbd
Loading module ext3
[root@master boot]# ls -l
total 2182
-rw-r--r-- 1
root
root 5824 Aug 20 17:59
boot.b
-rw-r--r-- 1
root
root 612 Aug 20
17:59 chain.b
-rw-r--r-- 1
root
root 49556 Jan 30 12:59
config-2.4.22-1.2166.nptl
drwxr-xr-x 2
root
root 1024 Feb 4
02:45 grub
-rw-r--r-- 1
root root
172748 Feb 4 02:34 initrd-2.4.22-1.2166.nptl.img
-rw-r--r-- 1
root root
162632 Feb 4 02:34 initrd-2.4.22-1.2166.nptl.img.orig
-rw-r--r-- 1
root
root 549 Jan 15
08:48 kernel.h
drwx------ 2
root
root 12288 Jan 15 02:25
lost+found
-rw-r--r-- 1
root
root 640 Aug 20
17:59 os2_d.b
lrwxrwxrwx 1
root
root 29
Feb 4 00:20 System.map -> System.map-2.4.22-1.2166.nptl
-rw-r--r-- 1
root root
571730 Jan 30 12:59 System.map-2.4.22-1.2166.nptl
lrwxrwxrwx 1
root
root 41
Feb 4 00:20 vmlinux-2.4.22-1.2166.nptl ->
../lib/modules/2.4.22-1.2166.nptl/vmlinux
lrwxrwxrwx 1
root
root 26
Feb 4 00:20 vmlinuz -> vmlinuz-2.4.22-1.2166.nptl
-rw-r--r-- 1
root root 1240029
Jan 30 12:59 vmlinuz-2.4.22-1.2166.nptl
See how the new initial ram disk image (initrd) is slightly bigger than
the
original. This is confirmation that it contains MORE modules than
the
original.
Step 13) Copy backup of original
image, then overwrite with new image
[root@master
root]# ls /boot/initrd-2.4.22-1.2166.nptl.img
/boot/initrd-2.4.22-1.2166.nptl.img
[root@master root]#
[root@master root]# cd /boot
[root@master boot]# cp
initrd-2.4.22-1.2166.nptl.img initrd-2.4.22-1.2166.nptl.img.orig
[root@master boot]# cp
/root/initrd-2.4.22-1.2166.nptl.img .
cp: overwrite
`./initrd-2.4.22-1.2166.nptl.img'? y
Step 14) Modify grub.conf to have two
boot options and default=saved
and panic=10 like the example below.
The first option is the "known good" kernel and original initrd, second
is the
same kernel but new initrd. The goal is to reboot using grub's
savedefault --default=X --once
option, which allows you to safely test rebooting into a new kernel or
configuration. panic=10 means it will reboot automatically in the
event
of a kernel panic. Otherwise you can use remote reboot in order
to reboot.
After the first boot, grub will choose the first option which should be
the
'safe' choice allowing you login remotely and diagnose the problem.
[root@master
boot]# cat /boot/grub/grub.conf
# grub.conf generated by anaconda
#
# Note that you do not have to rerun grub after making changes to this
file
# NOTICE: You have a /boot partition. This means that
# all kernel and
initrd paths are relative to /boot/, eg.
# root (hd0,0)
# kernel
/vmlinuz-version ro root=/dev/hda3
# initrd
/initrd-version.img
#boot=/dev/hda
default=saved
timeout=5
splashimage=(hd0,0)/grub/splash.xpm.gz
title Fedora Core (2.4.22-1.2166.nptl)
root (hd0,0)
kernel
/vmlinuz-2.4.22-1.2166.nptl ro root=/dev/hda3
initrd
/initrd-2.4.22-1.2166.nptl.img.orig
title Software RAID Fedora Core (2.4.22-1.2166.nptl)
root (hd0,0)
kernel
/vmlinuz-2.4.22-1.2166.nptl ro root=/dev/hda3 panic=10
initrd
/initrd-2.4.22-1.2166.nptl.img
Step 15) Launch the grub shell and
pick the new boot option as "--once"
[root@master
boot]# grub
Probing devices to guess BIOS drives. This may take a long time.
GRUB version 0.93 (640K lower / 3072K
upper memory)
[ Minimal BASH-like line editing is supported. For the
first word, TAB
lists possible command completions. Anywhere else
TAB lists the possible
completions of a device/filename.]
grub> help savedefault
savedefault: savedefault [--stage2=STAGE2_FILE] [--default=DEFAULT]
[--once]
Save DEFAULT as the default boot entry in
STAGE2_FILE. If
'--once' is specified, the default is reset after
the next reboot.
grub> savedefault --default=1 --once
grub> quit
Step 16) Now double-check your
grub.conf and initrd carefully, then reboot!
Step 17) After the reboot, check your
dmesg
The autoscan where it first attempts to start RAID always seems to fail
for me.
I could never figure out how that is supposed to work. The
secondary
scan from the lines that we added to rc.sysinit does work though.
[root@master
root]# cat /proc/mdstat
Personalities : [raid1]
read_ahead 1024 sectors
md0 : active raid1 hdc3[0]
79256576 blocks [2/1] [U_]
md2 : active raid1 hdc2[0]
1052160 blocks [2/1] [U_]
md1 : active raid1 hdc1[0]
104320 blocks [2/1] [U_]
unused devices: <none>
We have now verified that the RAID arrays start automatically at boot.
Step 18) Modify grub.conf, making the
second option point to the second drive
rather than first.
For safety you may want to use the grub shell to
manually confirm that your second disk is recongized by grub using
(hd1,0) enumeration.
[root@master
boot]# cat /boot/grub/grub.conf
# grub.conf generated by anaconda
#
# Note that you do not have to
rerun grub after making changes to this file
# NOTICE: You have a /boot
partition. This means that
#
all kernel and initrd paths are relative to /boot/, eg.
#
root (hd0,0)
#
kernel /vmlinuz-version ro root=/dev/hda3
#
initrd /initrd-version.img
#boot=/dev/hda
default=saved
timeout=5
splashimage=(hd0,0)/grub/splash.xpm.gz
title Fedora Core
(2.4.22-1.2166.nptl)
root (hd0,0)
kernel /vmlinuz-2.4.22-1.2166.nptl ro root=/dev/hda3
initrd /initrd-2.4.22-1.2166.nptl.img.orig
title Software RAID Fedora Core
(2.4.22-1.2166.nptl)
root (hd1,0)
kernel /vmlinuz-2.4.22-1.2166.nptl ro root=/dev/md0 panic=10
initrd /initrd-2.4.22-1.2166.nptl.img
Step 19) Copy filesystems into their
respective RAID partition counterparts.
[root@master
root]# cd /mnt
[root@master mnt]# ls
[root@master mnt]# mkdir newroot
[root@master mnt]# mkdir newboot
[root@master mnt]# mount -t ext3
/dev/md0 newroot
[root@master mnt]# mount -t ext3
/dev/md1 newboot
[root@master mnt]# cp -a /boot/*
/mnt/newboot/
[root@master mnt]# cp
/boot/.serverbeach /mnt/newboot/
[root@master mnt]# cd newroot/
[root@master newroot]# ls /
bin boot dev
etc home initrd lib lost+found misc
mnt opt proc root sbin tmp
usr var
[root@master newroot]# mkdir boot
initrd mnt proc
[root@master newroot]# cp -a /bin
/dev /etc /home /lib /misc /opt /root /sbin /tmp /usr /var .
[root@master newroot]# ls
bin boot dev
etc home initrd lib lost+found misc
mnt opt proc root sbin tmp
usr var
[root@master newroot]# ls /
bin boot dev
etc home initrd lib lost+found misc
mnt opt proc root sbin tmp
usr var
Step 20) Modify the newroot fstab to
match the RAID device names
[root@master
boot]# cat /mnt/newroot/etc/fstab
/dev/md0
/
ext3
defaults 1 1
/dev/md1
/boot
ext3
defaults 1 2
none
/dev/pts
devpts gid=5,mode=620 0 0
none
/proc
proc
defaults 0 0
none
/dev/shm
tmpfs defaults 0 0
/dev/md2
swap
swap
defaults 0 0 |
Step 21) Do the grub savedefault
--once trick again and reboot
If your system comes back, type "mount" to confirm that your system
booted from the RAID arrays.
[root@master
root]# mount
/dev/md0 on / type ext3 (rw)
none on /proc type proc (rw)
none on /dev/pts type devpts (rw,gid=5,mode=620)
/dev/md1 on /boot type ext3 (rw)
none on /dev/shm type tmpfs (rw)
Notice that any changes you make now will only be within the RAID array
on the second disk, and not on the first disk which is still intact and
non-RAID. And due to the grub --once trick, if you reboot now, it
will go back to your non-RAID disk.
Step 22) Prepare system for total RAID
conversion
Modify your /etc/mdadm.conf to add the corresponding partitions from
your
first drive that will be added to the RAID-1 array.
Update! RHEL4 no longer requires this mdadm.conf.
DEVICE /dev/hdc1 /dev/hdc2 /dev/hdc3
/dev/hda1 /dev/hda2 /dev/hda3
ARRAY /dev/md0
devices=/dev/hdc3,/dev/hda3
ARRAY /dev/md1
devices=/dev/hdc1,/dev/hda1
ARRAY /dev/md2
devices=/dev/hdc2,/dev/hda2
|
After your /etc/mdadm.conf has been modified in this way,
mdadm -A -s method of autoscan will probably fail because it does
not find the superblocks on the /dev/hda partitions.
This is why the other mdadm -Ac method is included in the rc.sysinit
patch. (It is a bit worrisome that the latter method totally
fails
on RHEL3.)
Since you have booted using /dev/md0, you cannot stop and start it
anymore, but you can safely unmount /dev/md1 (/boot) and stop it, and
swapoff -a and stop /dev/md0. Test the two methods of starting
the RAID arrays for yourself.
Step 23) POINT OF NO RETURN
The next step is to add the partitions of your first disk to the RAID
array.
This will destroy all data on the first disk, as well as the boot
sector where
grub is installed. Your system will be unbootable if it reboots
during
this procedure.
[root@master
boot]# cat /proc/mdstat
Personalities : [raid1]
read_ahead 1024 sectors
md2 : active raid1 hdc2[0]
1052160 blocks [2/1] [U_]
md0 : active raid1 hdc3[0]
79256576 blocks [2/1] [U_]
md1 : active raid1 hdc1[0]
104320 blocks [2/1] [U_]
unused devices: <none>
[root@master boot]# mdadm
/dev/md0 -a /dev/hda3
mdadm: hot added /dev/hda3
[root@master boot]# mdadm
/dev/md1 -a /dev/hda1
mdadm: hot added /dev/hda1
[root@master boot]# mdadm
/dev/md2 -a /dev/hda2
mdadm: hot added /dev/hda2
[root@master boot]# cat
/proc/mdstat
Personalities : [raid1]
read_ahead 1024 sectors
md2 : active raid1 hda2[2] hdc2[0]
1052160 blocks [2/1] [U_]
md0 : active raid1 hda3[2] hdc3[0]
79256576 blocks [2/1] [U_]
[>....................] recovery = 0.2%
(180048/79256576) finish=124.4min speed=10591K/sec
md1 : active raid1 hda1[2] hdc1[0]
104320 blocks [2/1] [U_]
unused devices: <none>
md: syncing RAID array md0
md: minimum _guaranteed_
reconstruction speed: 100 KB/sec/disc.
md: using maximum available idle
IO bandwith (but not more than 10000 KB/sec) for reconstruction.
md: using 124k window, over a
total of 79256576 blocks.
Step 24) Speed up the sync process
[root@master
root]# echo -n 500000 > /proc/sys/dev/raid/speed_limit_max
[root@master root]# echo
/proc/sys/dev/raid/speed_limit_max
/proc/sys/dev/raid/speed_limit_max
[root@master root]# cat
/proc/mdstat
Personalities : [raid1]
read_ahead 1024 sectors
md2 : active raid1 hda2[2] hdc2[0]
1052160 blocks [2/1] [U_]
md0 : active raid1 hda3[2] hdc3[0]
79256576 blocks [2/1] [U_]
[=>...................] recovery = 7.7%
(6152576/79256576) finish=21.6min speed=56344K/sec
md1 : active raid1 hda1[2] hdc1[0]
104320 blocks [2/1] [U_]
unused devices: <none>
Then wait until it completes....
[root@master
root]# cat /proc/mdstat
Personalities : [raid1]
read_ahead 1024 sectors
md2 : active raid1 hda2[2] hdc2[0]
1052160 blocks [2/1] [U_]
md0 : active raid1 hda3[2] hdc3[0]
79256576 blocks [2/1] [U_]
[=================>...]
recovery = 88.6% (70238784/79256576) finish=4.0min speed=36938K/sec
md1 : active raid1 hda1[2] hdc1[0]
104320 blocks [2/1] [U_]
unused devices: <none>
Just a random bit of trivia, notice how it slows down near the end of
the
operation? The beginning of your disk is nearly twice as fast as
the
end of your disk. Why? Think about the circumference of
cylinders.
(Somebody send me a link explaining the math of this.)
[root@master
root]# cat /proc/mdstat
Personalities : [raid1]
read_ahead 1024 sectors
md2 : active raid1 hda2[1] hdc2[0]
1052160 blocks [2/2] [UU]
md0 : active raid1 hda3[1] hdc3[0]
79256576 blocks [2/2] [UU]
md1 : active raid1 hda1[1] hdc1[0]
104320 blocks [2/2] [UU]
unused devices: <none>
Step 25) Reinstall GRUB onto BOTH disks
Now that you have allowed the second disc to replicate onto the first,
the first no longer has grub in the boot sector! To make matters
worse,
you can't use the regular grub-install script to automatically
re-install
grub, since it is a bit naive and is confused by the /dev/mdX devices
now
listed in /etc/fstab. Instead you need to reinstall grub manually
within
the grub shell. Luckily this is not difficult...
grub>
device (hd0) /dev/hda
grub> root (hd0,0)
Filesystem type is ext2fs,
partition type 0xfd
grub> setup (hd0)
Checking if
"/boot/grub/stage1" exists... no
Checking if "/grub/stage1"
exists... yes
Checking if "/grub/stage2"
exists... yes
Checking if
"/grub/e2fs_stage1_5" exists... yes
Running "embed
/grub/e2fs_stage1_5 (hd0)"... 15 sectors are embedded.
succeeded
Running "install
/grub/stage1 (hd0) (hd0)1+15 p (hd0,0)/grub/stage2 /grub/grub.conf"...
succeeded
Done.
Then just as the dell guy recommends in his HOWTO, you can do the same
for
the mirrored second disk in order to make it bootable too. This
gives you
some options in the event of a disk failure (read appendix B: Disk
failure).
grub> device
(hd0) /dev/hdc
grub> root (hd0,0)
Filesystem type is ext2fs, partition type 0xfd
grub> setup (hd0)
Checking if "/boot/grub/stage1" exists... no
Checking if "/grub/stage1" exists... yes
Checking if "/grub/stage2" exists... yes
Checking if "/grub/e2fs_stage1_5" exists... yes
Running "embed /grub/e2fs_stage1_5 (hd0)"... 15 sectors are
embedded.
succeeded
Running "install /grub/stage1 (hd0) (hd0)1+15 p
(hd0,0)/grub/stage2 /grub/grub.conf"... succeeded
Done.
Step 26) Cleanup config files
Remove the non-RAID option for your grub.conf now... because you cannot
boot
non-RAID anymore anyway. If reboot fails, then you did something
wrong.
Change the default back to '0', remove the panic=10, but most
importantly
change it back to (hd0,0)!
[root@master boot]# cat
/boot/grub/grub.conf
# grub.conf generated by anaconda
#
# Note that you do not have to rerun grub after making changes to this
file
# NOTICE: You have a /boot partition. This means that
# all kernel and
initrd paths are relative to /boot/, eg.
# root (hd0,0)
# kernel
/vmlinuz-version ro root=/dev/hda3
# initrd
/initrd-version.img
#boot=/dev/hda
default=0
timeout=5
splashimage=(hd0,0)/grub/splash.xpm.gz
title Software RAID Fedora Core (2.4.22-1.2166.nptl)
root (hd1,0)
kernel
/vmlinuz-2.4.22-1.2166.nptl ro root=/dev/md0
initrd
/initrd-2.4.22-1.2166.nptl.img |
You might as well erase the original initrd image now, since you are
not
booting in with that anymore either.
[root@master
boot]# rm initrd-2.4.22-1.2166.nptl.img.orig
rm: remove regular file
`initrd-2.4.22-1.2166.nptl.img.orig'? y
Do some stopping and starting tests now just to be sure...
[root@master
etc]# swapoff -a
[root@master etc]# mdadm --stop --scan
mdadm: fail to stop array /dev/md0: Device or resource busy
mdadm: fail to stop array /dev/md1: Device or resource busy
[root@master etc]# mdadm -A -s
mdadm: /dev/md2 has been started with 2 drives.
[root@master etc]# swapon -a
[root@master /]# umount /boot
[root@master /]# mdadm --stop --scan
mdadm: fail to stop array /dev/md2: Device or resource busy
mdadm: fail to stop array /dev/md0: Device or resource busy
[root@master /]# mdadm -A -s
mdadm: /dev/md1 has been started with 2 drives.
[root@master /]# mount boot/
Unfortunately you cannot test /dev/md0 in the same way, since it is
mounted as /.
After you have triple-checked all of your config files and tested
what you can, the only thing remaining to be done is reboot.
If you are religious, this is where you say your prayers.
XXX: Write stuff about post-conversion.
XXX: Suggest writing a raidtab file for raidtools fallback in case
rc.sysinit is overwritten during a future upgrade.
XXX: Write appendix with disaster recovery stuff.