Misc Links
Forum Archive
News Archive
File DB


(nothing here)



Latest Forum Topics
wow 56 k modems are
Posted by Red Squirrel
on Oct 14 2013, 11:52:23 pm

I Need A Program
Posted by rovingcowboy
on Sep 23 2013, 5:37:59 pm

having trouble witn lan
Posted by rovingcowboy
on Sep 23 2013, 5:40:56 pm

new problem for me
Posted by rovingcowboy
on Sep 23 2013, 5:54:09 pm

RBC Royal Bank
Posted by Red Squirrel
on Aug 13 2013, 6:48:08 pm


How to Use MDADM Linux Raid
A highly resilient raid solution!
By Red Squirrel

For a raid, you obviously need multiple hard drives. In the case of software raid, you'll want a separate OS drive (can be using hardware raid) and then data drives. You CAN use md raid for the OS, but I have never tried it myself and am not sure of the process. Perhaps I can cover that in a future article once I've done it. Ideally, you want to build a machine and have removable disk bays on the front to make hot swapping easy. This is also known as a backplane. Some server cases have it built in. For standard cases you can buy an enclosure that fits in the 3.25" bays. One of the advantages of a setup like this is the ability to hot swap drives. When a drive fails, you simply pull the faulty one out and insert a new one, and let it rebuild. Some will require you to screw the drive to a tray while others will take the drive directly as the sata/power orientation is standard on all drives (that I've seen). You do not need to go with hot swap bays though. If you want you can still put the drives inside the case, it's your choice. It's also a good idea to label the drive bays and keep a list somewhere of the serial numbers and which bays they are in. I will cover this more later.

Selecting Drives
The first thing you want to do is decide which drives to use. If you are anal about the physical placement of the drives like I am, the easiest way to ensure a single array occupies a single area of the bay is to have all the drives out, then insert as needed. You can issue the dmesg command which will tell you what the drive was named. The dmesg -c command will also clear the log, so you should probably do that before you insert the drive. Take note of the name, insert the next drive, and repeat.

If you don't care and already have all the drives inserted, you can type fdisk /dev/sd and hit tab. fdisk can be any other command. You will see something like this:

[root@raidtest ~]# fdisk /dev/sd  [tab]
sda   sda1  sda2  sdb   sdc   sdd   sde   sdf   sdg   sdh   sdi   sdj   sdk   sdl   sdm   sdn   sdo

Remember, one of these is your OS drive, you want to leave that one alone. When building a server I always ensure to put the OS drive in the first sata port. Normally they are numbered 0,1,2 etc... So in this case it is sda. It is also clear given it has multiple partitions: sda1 and sda2. If you want to know more information on a drive such as it's serial number, make, and size, you can use the smartctl command.

[root@raidtest ~]# smartctl -a /dev/sda
smartctl version 5.38 [x86_64-redhat-linux-gnu] Copyright (C) 2002-8 Bruce Allen
Home page is http://smartmontools.sourceforge.net/

Device Model:     WDC WD6402AAEX-00Z3A0
Serial Number:    WD-WCATR4058891
Firmware Version: 05.01D05
User Capacity:    640,135,028,736 bytes
Device is:        Not in smartctl database [for details use: -P showall]
ATA Version is:   8
ATA Standard is:  Exact ATA specification draft version not indicated
Local Time is:    Mon Sep  5 22:20:29 2011 EDT
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x84)	Offline data collection activity
					was suspended by an interrupting command from host.
					Auto Offline Data Collection: Enabled.
Self-test execution status:      (   0)	The previous self-test routine completed
					without error or no self-test has ever
					been run.
Total time to complete Offline
data collection: 		 (12360) seconds.
Offline data collection
capabilities: 			 (0x7b) SMART execute Offline immediate.
					Auto Offline data collection on/off support.
					Suspend Offline collection upon new
					Offline surface scan supported.
					Self-test supported.
					Conveyance Self-test supported.
					Selective Self-test supported.
SMART capabilities:            (0x0003)	Saves SMART data before entering
					power-saving mode.
					Supports SMART auto save timer.
Error logging capability:        (0x01)	Error logging supported.
					General Purpose Logging supported.
Short self-test routine
recommended polling time: 	 (   2) minutes.
Extended self-test routine
recommended polling time: 	 ( 145) minutes.
Conveyance self-test routine
recommended polling time: 	 (   5) minutes.
SCT capabilities: 	       (0x3037)	SCT Status supported.
					SCT Feature Control supported.
					SCT Data Table supported.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
  1 Raw_Read_Error_Rate     0x002f   200   200   051    Pre-fail  Always       -       0
  3 Spin_Up_Time            0x0027   173   170   021    Pre-fail  Always       -       4308
  4 Start_Stop_Count        0x0032   100   100   000    Old_age   Always       -       68
  5 Reallocated_Sector_Ct   0x0033   200   200   140    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x002e   200   200   000    Old_age   Always       -       0
  9 Power_On_Hours          0x0032   091   091   000    Old_age   Always       -       6708
 10 Spin_Retry_Count        0x0032   100   253   000    Old_age   Always       -       0
 11 Calibration_Retry_Count 0x0032   100   253   000    Old_age   Always       -       0
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       66
192 Power-Off_Retract_Count 0x0032   200   200   000    Old_age   Always       -       31
193 Load_Cycle_Count        0x0032   200   200   000    Old_age   Always       -       36
194 Temperature_Celsius     0x0022   109   098   000    Old_age   Always       -       38
196 Reallocated_Event_Count 0x0032   200   200   000    Old_age   Always       -       0
197 Current_Pending_Sector  0x0032   200   200   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0030   200   200   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x0032   200   200   000    Old_age   Always       -       0
200 Multi_Zone_Error_Rate   0x0008   200   200   000    Old_age   Offline      -       0

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
No self-tests have been logged.  [To run self-tests, use: smartctl -t]

SMART Selective self-test log data structure revision number 1
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

This also shows smart information and is a great tool to assess if a hard drive may be failing.

Creating an array
All Linux raid related work is done with the mdadm command. A Linux raid is actually surprisingly easy to setup. mdadm --help shows all the different commands and if you do mdadm --command --help it shows more info for that command. Instead of just slapping out the output of --help like lot of articles tend to do, I'll actually walk you through the process, but the --help does serve as a nice reference if you forget the exact wording of a command.

Let's start by making a raid5 array using 4 drives: /dev/sdb, /dev/sdc, /dev/sdd and /dev/sde. These drives are 1GB drives, so to calculate the actual size the array will be, simply add all the drives minus one. That gives us 3GB.

[root@raidtest ~]# mdadm --create --level=5 --raid-devices=4 /dev/md0 /dev/sdb /dev/sdc /dev/sdd /dev/sde
mdadm: array /dev/md0 started.
[root@raidtest ~]#
--level is the raid level, which is raid 5. --raid-devices is the number of drives we are using, which is 4. /dev/md0 is the name of the raid device. Raid devices always start with md, and it will show up like any other drive. You can call it anything you want but it's best to stick to the proper convention to avoid confusion. If you use another name it will create a md named device anyway, so may as well just name it yourself with md. Lastly, the 4 raid devices are added following enter. At this point, the raid is building and it could take anywhere from an hour to days depending on the number of drives, their size, speed, and the processor's speed. To see a status of the raid you can use this command:

[root@raidtest ~]# mdadm --detail /dev/md0
        Version : 1.2
  Creation Time : Tue Sep  6 18:31:41 2011
     Raid Level : raid5
     Array Size : 3144192 (3.00 GiB 3.22 GB)
  Used Dev Size : 1048064 (1023.67 MiB 1073.22 MB)
   Raid Devices : 4
  Total Devices : 4
    Persistence : Superblock is persistent

    Update Time : Tue Sep  6 18:31:55 2011
          State : clean, degraded, recovering
 Active Devices : 3
Working Devices : 4
 Failed Devices : 0
  Spare Devices : 1

         Layout : left-symmetric
     Chunk Size : 512K

 Rebuild Status : 19% complete

           Name : raidtest.loc:0  (local to host raidtest.loc)
           UUID : e0748cf9:be2ca997:0bc183a6:ba2c9ebf
         Events : 4

    Number   Major   Minor   RaidDevice State
       0       8       16        0      active sync   /dev/sdb
       1       8       32        1      active sync   /dev/sdc
       2       8       48        2      active sync   /dev/sdd
       4       8       64        3      spare rebuilding   /dev/sde
[root@raidtest ~]#

If you want to easily monitor the progress of one or more rebuilds, you can also use this command:

[root@raidtest ~]# watch 'cat /proc/mdstat'
Personalities : [raid6] [raid5] [raid4]
md0 : active raid5 sde[4] sdd[2] sdc[1] sdb[0]
      3144192 blocks super 1.2 level 5, 512k chunk, algorithm 2 [4/3] [UUU_]
      [======>..............]  recovery = 34.6% (363392/1048064) finish=0.7min speed=15141K/sec

unused devices: 
[root@raidtest ~]#

This will update every 2 seconds. Alternatively you can just use cat /proc/mdstat directly to show it once.
Once the array is complete, it will look like this:

[root@raidtest ~]# mdadm --detail /dev/md0
        Version : 1.2
  Creation Time : Tue Sep  6 18:31:41 2011
     Raid Level : raid5
     Array Size : 3144192 (3.00 GiB 3.22 GB)
  Used Dev Size : 1048064 (1023.67 MiB 1073.22 MB)
   Raid Devices : 4
  Total Devices : 4
    Persistence : Superblock is persistent

    Update Time : Tue Sep  6 18:32:49 2011
          State : clean
 Active Devices : 4
Working Devices : 4
 Failed Devices : 0
  Spare Devices : 0

         Layout : left-symmetric
     Chunk Size : 512K

           Name : raidtest.loc:0  (local to host raidtest.loc)
           UUID : e0748cf9:be2ca997:0bc183a6:ba2c9ebf
         Events : 20

    Number   Major   Minor   RaidDevice State
       0       8       16        0      active sync   /dev/sdb
       1       8       32        1      active sync   /dev/sdc
       2       8       48        2      active sync   /dev/sdd
       4       8       64        3      active sync   /dev/sde
[root@raidtest ~]#

Whether or not the array is done rebuilding, it is ready to start using. This means you can format it, mount it, and put data on it, so let's do that:

[root@raidtest ~]# mkfs.ext4 /dev/md0
mke2fs 1.41.12 (17-May-2010)
Filesystem label=
OS type: Linux
Block size=4096 (log=2)
Fragment size=4096 (log=2)
Stride=128 blocks, Stripe width=384 blocks
196608 inodes, 786048 blocks
39302 blocks (5.00%) reserved for the super user
First data block=0
Maximum filesystem blocks=805306368
24 block groups
32768 blocks per group, 32768 fragments per group
8192 inodes per group
Superblock backups stored on blocks:
        32768, 98304, 163840, 229376, 294912

Writing inode tables: done
Creating journal (16384 blocks): done
Writing superblocks and filesystem accounting information: done

This filesystem will be automatically checked every 21 mounts or
180 days, whichever comes first.  Use tune2fs -c or -i to override.
[root@raidtest ~]#
[root@raidtest ~]#
[root@raidtest ~]# mkdir /mnt/md0
[root@raidtest ~]#
[root@raidtest ~]# mount /dev/md0 /mnt/md0
[root@raidtest ~]# dir /mnt/md0
[root@raidtest ~]#

Optionally you could create a few partitions on it first. A quick way to confirm that the mount is working is seeing the lost+found folder. You can also use the df command which displays the disk space usage on all local devices:

[root@raidtest ~]# df -hl
Filesystem            Size  Used Avail Use% Mounted on
/dev/sda2             7.9G  2.9G  4.7G  39% /
tmpfs                 499M     0  499M   0% /dev/shm
/dev/sda1             194M   25M  159M  14% /boot
/dev/md0              3.0G   69M  2.8G   3% /mnt/md0
[root@raidtest ~]#

At this point, should any of those 4 drives fail, the data will still be available and normal operations will continue unaffected. Before we go on, we need to save the configuration. Each drive has a UUID which identifies it as part of an array, however at this point if we reboot, we will need to reassemble this array. To make this automatic we need to save the settings. To do this, issue this command:

[root@raidtest ~]# mdadm --detail --scan > /etc/mdadm.conf
[root@raidtest ~]#

Now if we reboot, the /dev/md0 device will exist and it just needs to be mounted. If you want to automate mounting you can add it to /etc/fstab or other startup script. Personally I do not like using /etc/fstab as if it fails to mount for whatever reason, the entire system will fail to boot. It is best to add it to /etc/rc.local or other area. However if any programs such as mysql depend on this mount, then you will need to put it in fstab so it mounts before these programs load.

Assembling an array
Let's say you forgot to save the settings, and the system rebooted, you are not out of luck. However you will need to know which 4 drives contain the raid. Unless you've been swapping drives around, they should be called the same. To re-assemble, simply do the following:

[root@raidtest ~]# mdadm --assemble /dev/md0 /dev/sdb /dev/sdc /dev/sdd /dev/sde
mdadm: /dev/md0 has been started with 4 drives.
[root@raidtest ~]#

Even with the configuration file set, I have seen some situations where it does not start on it's own. With the configuration file in place you start the array with the above command except you do not need to specify the individual drives. One thing to know as well is the physical drive order does not matter anymore. You can actually turn off the server, go swap all the drives around, and you will boot up fine and the array will work fine. Of course, don't do that as it will mess up your documentation. Any physical drive change you do, you should document.

Email alerts
Now it's fine and dandy to have redundancy, but I have seen this even in corporate environments, where a failure goes unnoticed because there is no proper alerting system. Then another drive fails, and it's game over. By default, root gets all messages related to mdadm. This may be fine in some cases if you have a forward setup but in other cases you may want to have that mail go to another address. To do this, add the following command in a startup script such as /etc/rc.local:

mdadm --monitor --scan --mail=[email address] --delay=1800 &

If you want to test to ensure any alert emails will be received you can issue this command:

[root@raidtest ~]# mdadm --monitor --scan --test
[root@raidtest ~]#

You will need to ctrl+c out of it. You should get an email that looks something like this:

TestMessage event on /dev/md0:raidtest.loc
mdadm monitoring 
Tue, 6 Sep 2011 19:11:53 -0400 (EDT)

This is an automatically generated mail message from mdadm
running on raidtest.loc

A TestMessage event had been detected on md device /dev/md0.

Faithfully yours, etc.

P.S. The /proc/mdstat file currently contains the following:

Personalities : [raid6] [raid5] [raid4]
md0 : active raid5 sdb[0] sde[4] sdd[2] sdc[1]
      3144192 blocks super 1.2 level 5, 512k chunk, algorithm 2 [4/4] [UUUU]

unused devices: 

On the next page, we'll continue by looking with how to deal with a hard drive failure.

Next Page
38497 Hits Pages: [1] [2] [3] [4] 0 Comments

Latest comments (newest first)
Be the first to post a comment!

Top Articles Latest Articles
- What are .bin files for? (669062 reads)
- Text searching in linux with grep (161180 reads)
- Big Brother and Ndisuio.sys (150471 reads)
- PSP User's Guide (139547 reads)
- SPFDisk (Special Fdisk) Partition Manager (117240 reads)
- How to Use MDADM Linux Raid (188 reads)
- What is Cloud Computing? (1225 reads)
- Dynamic Forum Signatures (version 2) (8769 reads)
- Successfully Hacking your iPhone or iTouch (18714 reads)
- Ultima Online Newbie Guide (35906 reads)
corner image

This site best viewed in a W3C standard browser at 800*600 or higher
Site design by Red Squirrel | Contact
© Copyright 2021 Ryan Auclair/IceTeks, All rights reserved