OnQ

The worklife blog of Eriq Oliver Neale...

On Dell's PERC 5/i

So in a previous post, I railed about my surprise at finding out that the Dell PERC 5/i controller has no audible alarm and why that's a concern for me. Well, here we are, nearly two weeks later, and after much going round and round with Dell on the issue, I have more information, and it's not necessarily good.

My specific initial issue is that, besides not having an audible alarm, Dell's Server Management software (Open Manage) doesn't have a way to send notifications about problems with the RAID controller, either the controller itself or the failure of an element attached to the controller. After my initial support call with Dell about the issue, they indicated that the IT Assistant software should run on the Windows 64-bit box, and that will send notifications when an issue is detected. I've since found out that no, the latest version of IT Assistant that's posted on the Dell web site will NOT, in fact, run on the 64-bit Windows platform. Of course, everything about IT Assistant tells you that it really, truly, should be run on another box, but for this specific instance, that's not going to be possible.

In digging further into this, however, I've uncovered a couple of other issues that concern me. Given that IT Assistant seems to be the preferred way to actively monitor the RAID, I thought I'd try to install it on the SBS box I have at the office. No dice. IT Assistant will install onto a server platform, but not SBS. It's a hard block. So for all my SBS servers with PERC 5/i cards, I have nothing from Dell that I can run on the server box that will monitor the helath of the RAID controller and notify me when there's a problem. With my other RAID controllers, I can fall back on the audio alert at the very least for notification of a problem, but don't have that option here.

Last Thursday, at my SBS user group meeting, I mentioned my frustrations with the situation in a side conversation, and one of the folks I was talking with was as taken aback as I was when I first figured this out. He just put a number of servers with PERC 5/i cards in them out in production and was also unaware that the controller had no audible alarm mechanism. I've mentioned this to a couple of other folks as well, with pretty much the same reaction.

This week, I started putting together specs for a couple of new servers for a couple of projects and, knowing the challenges of the PERC 5/i, decided to look at other controller options for these boxes. Unfortunately, I've found that, currently, the only RAID controller that Dell provides that supports RAID 5 is, surprise surprise, the PERC 5/i. There are other controllers, but those only support RAID 0 or 1. And I don't yet have confirmation if those controllers have audible alarms on them or not. So, even if I was able to "settle for" a RAID 1 solution (and to be fair, on one box it's not unrealistic), I still think I'd be in the same situation.

I've been working with a couple of folks at Dell on trying to find a reasonable resolution to this problem. Of course, there's always the recommendation that I can run IT Assistant on a separate workstation to monitor the array card in the server and send notifications back to me  if/when there's a problem. But that's not necessarily a realistic solution at some sites. Now I have to install a piece of software on a workstation that has to be running all the time and may or may not interfere with what the user of the workstation is trying to do. I simply can't afford to stick a dedicated box at each of my client sites to do this monitoring, nor can I ask them to dedicate a workstation to do this themselves. It looks like I'm going to have to go third party for a solution, and while that's probably less costly than doing a dedicated workstation to monitor the array, it's still an added expense that I really don't think I should have to incur in order to be proactive with my clients.

I honestly believe the folks I've spoken with at Dell understand my plight. While they have not committed to anything, there have been discussions about changes to engineering on future controllers to ensure an audible alarm among other possibilities. Based on a series of messages that floated around this afternoon, I know the issue has been escalated internally, but still have no clear direction on where to go.

At the end of the day, two weeks after I first placed the call regarding the failed array and lack of notification of the failure, I still have a box that I will have to manually monitor for RAID health. I'm hoping for a better solution, and I expect that I'll just have to be patient.

I sure hope that data cable doesn't pop off the drive connector again, tho...

Posted: Mar 30 2007, 04:37 PM by eriq | with 20 comment(s)
Filed under:

Comments

Eli said:

Thank you!

I'm intending to buy such controler to my Dell 690P workstation. and i will check this problem.

Regards

zilkael@013.net

# June 3, 2007 10:55 AM

Electrosonics said:

Right on with your comments on Perc 5/i! As of 7/6/07 I too have come to a dead end. No answers from dell on monitoring the health of the RAID array. In my prototype system (4 drive array) from Dell, I was able to see events in the event log when the RAID array had a drive failure. With health monitor, I was able to create a rule to email me when that event occured. But in the production system (3 drive array with hot spare, wiped, new array, reload), I have yet to be able to reproduce my initial success. When I have a drive failure, the hot swap spare automatically kicks in but no event log message is reported hence no notification of a drive failure. This is pure BS. Notification upon a drive failure is so easy to impliment at the driver level. Someone dropped the ball here. I find it frustrating that Dell product development cannot do better.

# July 6, 2007 11:09 AM

looplocal said:

Along with the lack of audible alarm, if you purchased a Dell Perc5/i RAID controller card (not necessary a cheap investment) with a Precision workstation, be aware that it will come WITHOUT a card battery backup unit (BBU).

This was surprising as you would think it would have been a fairly trivial cost addition relative to the price of the $599 card.  I spent hours trying to order it as a spare part from Dell, but with no luck.

# July 21, 2007 7:51 AM

Stephen B. said:

I've written a cron job that I've set to execute every minute.  With each run, it blasts every open tty with a message from Dell's OMSA software if it finds a drive that has failed or is about to fail, and it sends mail to the root account (or its alias) if a drive's rebuilding.  It's meant to be insanely annoying to demonstrate just how important it is that it get fixed.  I run a small lab, so it's important to me that anyone using the system would be able to know that something's up.  The rebuilding part may be omitted, but I like to check up on the status of any rebuilding drive.  I'm sure there are tons of tweaks to it that a more experienced programmer could make (like sending messages to SMS servers, sending non-local e-mails), but I've tried to keep it simple for now.  It's posted it below, in hopes that it can help people out:

~~~

#! /bin/sh

###################################################

# Checks RAID array to ensure that drives have not

# gone critical and are not predicted to fail.  If

# something is amiss, hammer out a notice to

# ensure that the faulty drive is replaced before

# a second failure.  While it's repairing, mail a

# notice to root informing root of rebuilding

# progress.

#

# Depends on Dell's OMSA being installed; tested

# on PERC 5/i HW RAID card on Dell PowerEdge 2900

# running Debian etch.

#

# Meant to be run in root's crontab.

###################################################

# Assigns script variables

# Edit these values to correspond to your hardware

# and software configuration

CONTROLLER=0

MAIL_ROOT="root"

###################################################

#DO NOT EDIT BELOW THIS LINE!

###################################################

# Locates omreport on this system

OMREPORT=`which omreport`

GREP=`which grep`

WALL=`which wall`

MAIL=`which mail`

# Checks status of RAID array

# First, checks for critical failure and sends messages to all ttys if failure has occurred

RAID_CRITICAL=`$OMREPORT storage pdisk controller=$CONTROLLER | $GREP Critical`

if [ -n "$RAID_CRITICAL" ]; then

   echo `$OMREPORT storage pdisk controller=$CONTROLLER | $GREP -B 3 -A 2 Critical` | $WALL

   exit 1

fi

# Next, checks for predicted failure and sends messages to all ttys if failure is predicted

RAID_FAILURE=`$OMREPORT storage pdisk controller=$CONTROLLER | $GREP "Failure Predicted         : Yes"`

if [ -n "$RAID_FAILURE" ]; then

   echo `$OMREPORT storage pdisk controller=$CONTROLLER | $GREP -B 4 "Failure Predicted         : Yes"` | $WALL

   exit 2

fi

# Finally, checks for rebuilding of array and sends messages to root if rebuilding

RAID_REBUILDING=`$OMREPORT storage pdisk controller=$CONTROLLER | $GREP Rebuilding`

if [ -n "$RAID_REBUILDING" ]; then

   echo `$OMREPORT storage pdisk controller=$CONTROLLER | $GREP -B 3 -A 2 Rebuilding` | $MAIL -s "RAID array rebuilding" "$MAIL_ROOT"

   exit 3

fi

# July 24, 2007 5:53 PM

Doug Wassmer said:

I just purchased a PERC 5/i daughter card for my Poweredge 1900 running Windows 2003 Server. I ran across your coments when I was searching for instructions on how to install the daughter card. I think that it fits into slot 4 on the motherboard, but I have no idea where to place the battery that came with it.  The battery does not attach to the daughtercard. I couldn't find good instructions at the dell iste or on the documentation that came with the 1900.

# August 8, 2007 10:27 AM

James said:

You can use the LSI software, as the Perc 5/i is really an Dell OEM version of the LSI MegaRAID SAS 8408E, with a few minor differences.

MegaRAID Storage Manager can email alerts to a set email address if something goes wrong.

www.lsi.com/.../index.html

And, Looplocal, you can pick a BBU off eBay for sometimes as little as $15, so all is not lost! :) - I bought a whole kit, battery, cable and holder for $30.

# August 10, 2007 10:31 PM

infinity005 said:

I stumbled on this while looking for answers to performance problems with the perc 5/i. Before going live, I've run iozone on it and write performance with 7 drives in raid 5 is atrocious! I haven't yet figured out why, and the card will have to get returned if I don't because an array with 4 disks is 3 time faster doing writes than a 7 disk array.

# October 6, 2007 5:47 PM

B.Grujevsky said:

Yes, the LSI software works fine. Peculiar that DELL didn't take that "part" of the LSI-controller into their software.

# October 16, 2007 5:26 PM

Wes said:

James has got the best info.. Go to LSI webpage and download the MegaRAID Storage Manager, it is the exact same software, but has email notifications enabled.

For some reason Dell stripped out the SNMP section of the LSI software, which is the part that does the notifications.  Great job Dell!

# October 29, 2007 1:06 PM

LoopLocal said:

Thanks for that info.  I managed to find the kit as suggested.  Once I had the part number Dell was able to provide via the Spare Parts department.

I installed the BBU, but how does one know it is recognized? My Perc 5 BIOS enabled  configuration does not give any info other than the warning after checking write back even without bbu option.  Any idea on how to know if a Perc 5 bbu is installed and identified by the bios correctly?  Will the Perc 5 BIOS config always give the Write Back/BBU warning even if a BBU is installed?

Thanks for any info.

loop

# December 31, 2007 11:48 PM

LoopLocal said:

Please forgive my BBU question.  I installed the MegaRaid as suggested by James and that provides an abundance of info (unlike the BIOS config) including the BBU identification.

BBU Present?   YES !

# January 1, 2008 12:10 AM

LoopLocal said:

At the LSI site, there seems to be a more aggressive driver and firmware availability compared to Dell.  

Any thoughts on if it is advisable to go with LSI's driver and firmware releases instead of Dell's outdated ones?  There seems to be quite a few bug fixes and many Vista enhancements/fixes.

# January 1, 2008 12:11 AM

Adam Cybulski said:

I see that you posted this about a year ago, I was wondering if any progress has been made on this issue. I am running into the exact same problem, we deployed numerous Dell servers to our clients only to find that they have no alert system. while some of the clients have workstations, others do not.

# January 28, 2008 8:27 AM

OnQ said:

I've had a few comments show up on the series of PERC 5/i posts I had early in 2007. There have been

# January 28, 2008 9:07 AM

Andy said:

Loop - can you post the part number for the BBU? - is it the G3399?

# February 20, 2008 3:00 PM

Jerodh said:

Great blog article.  Very helpful.  I'm in the process of putting together a PowerEdge 1900 with Perc 5/i running CentOS 5.1 64-bit for vmware server 1.0.5.  Hope to run a Windows and Novell virtual machines on it.

I'm going to try out an altered script that Stephen B. wrote.  If I can't get that to work then I'll use the LSI MegaRAID Storage manager.  My concern with running the LSI product is that I'm using Dell's megaraid_sas driver version 3.16 and LSI's latest driver is 3.13.

# May 7, 2008 8:57 PM

jerod h said:

I couldn't get the script above to work (my bash didn't like the pipe command in the variables), so I created this one using perl.  

#!/usr/bin/perl

# A simple perl program to send alerts if problem found with physical disks

# by using OMSA 5.2 for Dell PowerEdge

# I do not provide any guarantee that this script will work. Use at own risk.

#Written: May 2008

#By: Jerod H

$controller="0";

$emailaddress="jerod\@yourdomain.com";

$omreport="/usr/bin/omreport";

$mail="/bin/mail";

$servername="yourservername";

# run omreport command and put into olist

open(LS, "$omreport storage pdisk controller=$controller |");

while() {

chomp;

push @olist, $_;

}

close(LS);

# Go through each line in olist to look for "Critical", "Rebuilding", or

# "Failure Predicted : YES"

$email=0;

$subject="";

foreach $line(@olist){

if ($line =~ /Critical$/i) {

$email=1;

$subject="$servername Hard Drive Critical";

}

if ($line =~ /Failure$/i) {

$email=1;

$subject="$servername Hard Drive Rebuilding";

}

if ($line =~ /^Failure\sPredicted(\s)+:\sYes$/i) {

$email=1;

$subject="$servername Hard Drive Predicted to Fail";

}

}

#If something was found email will = 1 so send email

if ($email==1) {

system("$omreport storage pdisk controller=$controller | $mail -s '$subject' $emailaddress");

}

# May 9, 2008 8:20 AM

Tyler said:

I think Dell should place the alarm on the controller.  Those admins who do not want the alarm can disable it like you could before yanking it from the card.

# May 9, 2008 8:32 AM

Mårten said:

Hi!

Im going for a 2900, is the Perc 6/i any better?

# December 17, 2008 9:21 AM

eriq said:

PERC 6 series has the same problems. No audible alarm. I've continued to try to raise the issue with Dell but with little response, other than the technicians agreeing that the audible alarm should at least be an option.

# December 17, 2008 9:26 AM
Leave a Comment

(required) 

(required) 

(optional)

(required)