HumbabaRaid

MIRRORing command

rsync -av --inplace --delete --exclude=/boot --exclude=/etc --exclude=/lost+found --exclude=/opt --exclude=/proc --exclude=/root --exclude=/selinux --exclude=/srv --exclude=/sys --exclude=/tmp --exclude=/mnt --exclude=/home/jared/TEMPITEMS --exclude=/home/junkvdomain gusoyn:/* /

The following are modded from gusoyn to humbaba:

/etc/fstab
/etc/hosts
/etc/exim4/exim.conf.jrb
/etc/network/interfaces
/etc/hostname
/etc/apache2/sites-available/jaredblaser.com (allow jbwiki -> photos)

Key dirs for mirroring:

/usr/lib/mysql (DBs)
/home/vdomain
/home/jared (excluding TEMPITEMS)

Switchover

Switching over from Gusoyn to Humbaba, I followed these steps:

Gusoyn:
- disallow new logins/users, i.e., email, DB-based websites
- stop Mysql
- stop Exim
- stop Samba
- stop Apache
- comment out crontab entries:
  - root
  - jared
  - rebecca
  - (vdomains)
arpit (router)
- disable port forwarding for email (port 25)
- forward ports to 192.168.1.17 for all services
Humbaba:
- stop Mysql
- mirror DBs from Gusoyn to Humbaba (/var/lib/mysql)
- start Mysql
- confirm that Humbaba:/home is a true mirror of Gusoyn:/home (excluding /home/jared/TEMPITEMS)
arpit (router)
- enable port forwarding for email (port 25)
Humbaba:
- enable crontab entries:
  - root
  - jared
  - rebecca
  - (vdomains)
update hosts file on all servers and clients (vdomains to 192.168.1.17)
TODO:
- enable secure email setup (port 465, 995)

2015-07-30

Today I discovered (via an email from mdadm monitor that one side of my raid had failed. It probably happened yesterday when I was reorganizing some to server's cables. I must have momentarily disconnected the USB-attached /dev/sda. In any case, /dev/md0's /dev/sda1 was in fail mode, but the other partition, /dev/md1's /dev/sda5 (used as swap) was showing okay. Probably because the system has yet to allocat any swap space since boot.

Well, I was not successful with this command, trying to remove the partition's so that I could re-add them:

mdadm --manage /dev/md0 --remove /dev/sda1
mdadm --manage /dev/md1 --fail /dev/sda5

Turns out that even though the drive was disconnected -- which failed the raid -- when it reattached, it was allocated as /dev/sdd not /dev/sda. So the raid manager was complaining that the device didn't exist.

To overcome this, I had to use another command, to get the raid daemon to let go:

mdadm --manage /dev/md0 --remove detached
mdadm --manage /dev/md1 --fail detached
mdadm --manage /dev/md1 --remove detached

This released the device from the raid system as well as from the kernel, so when I disconnected and reattached the USB drive it came back as the original /dev/sda. Then to return to standard operations:

mdadm --manage /dev/md0 --re-add /dev/sda1
mdadm --manage /dev/md1 --re-add /dev/sda5

While added back into the raid successfully, each partition was re-mirrored, which takes a long time with 320GB. I have run across references on the web suggesting that if there was some metadata <mumble, mumble> then the newly re-added drive would just sync up again without the full mirror. More investigation. But the good news is that I'm back to normal again.

2016-01-21

mirrored disk maintenance

The two mirrored drives that I am using on this system are older 320GB IDE drives that I've installed in external USB cases. The cases are of Chinese manufacture and thus are not of the highest quality. On other identical cases I recently found that there was a potential for the electronics internal to the cases to short against the case, thus disabling or at least interfering with the power and signaling at least temporarily. If the encased drive or its cable are manipulated in certain ways, this short can occur. This is not a good operational condition. Today, I decided to remedy this for each drive in turn, by manually failing the drive then removing the drive from the array, using mdadm, then physically remove the drive, adjust the potential short conditions, re-attach physically, then re-add the drive to the array, again using mdadm. This I did. First with /dev/sda, then /dev/sdb:

mdadm --manage /dev/md0 --fail /dev/sda1
mdadm --manage /dev/md0 --remove /dev/sda1
mdadm --manage /dev/md1 --fail /dev/sda5
mdadm --manage /dev/md1 --remove /dev/sda5

Disconnecting the drive from its USB cable I opened the drive, removed drive and the small PCB from inside, and filed the sharp pins on the solder side of the PCB smooth. Then reassembled the drive, added little rubber feet (at last!), and then re-connected the case to the USB cable.

mdadm --manage /dev/md0 --add /dev/sda1
mdadm --manage /dev/md1 --add /dev/sda5

Then, the re-mirroring began and lasted about 4 hours in all. Then I repeated for /dev/hdb.

NB. This method of manually failing and removing the device from the array, using the commands above has the advantage of releasing the drive at the kernel level, and I did not have to use the 'detached' keyword, because I had not yet physically removed the drive from the array. All goes smoothly by logically removing the drive first, then physically detaching it. If the drive dies off, however, and fails, this manual failing doesn't work and the array has to be told to release the drive that is no longer recognized:

mdadm --manage /dev/md0 --remove detached
mdadm --manage /dev/md1 --remove detached

2023-05-24

Clearing interrupted re-sync of mirrored disks

While performing disk maintenance on humbaba I had to interrupt a resync operation. Sadly, once this was done, re-adding mirror disks to the RAID array only marks them as spares and they will not re-sync. The command to clear the stalled re-sync is:

echo "idle" > /sys/block/md0/md/sync_action

Once this command was issued, drives that were currently shown as spares rather than active mirrors restarted their re-sync and all is well.