Sunday, May 25, 2008

Installing LSI Logic RAID monitoring tools under the ESX service console

As I discussed in a recent post, I used a Dell Perc 5i SAS controller in my ESX whitebox server. One of the nice features of this controller is that it is a rebranded LSI Logic controller (with a different board layout!), supported by LSI Logic firmwares and the excellent monitoring tools that LSI offers.

Of course, it is important to keep track of your RAID array status, so I decided to install the MegaCLI monitoring software under the ESX Server 3.5 Service Console. Here's how I did it and configured the monitoring on my system:
  • The MegaCLI software can be downloaded from the LSI Logic website. I used version 1.01.39 for Linux, which comes in a RPM file.

  • After uploading the RPM file to the service console, it was a matter of installing it using the "rpm" command:

    rpm -i -v MegaCli-1.01.39-0.i386.rpm

    This installs the "MegaCli" and "MegaCli64" commands in the /opt/MegaRAID/MegaCli/ directory of the service console.
That's it, MegaCLI is ready to be used now. Some useful commands are the following:
  • /opt/MegaRAID/MegaCli/MegaCli -AdpAllInfo -aALL
    This lists the adapter information for all LSI Logic adapters found in your system.

  • /opt/MegaRAID/MegaCli/MegaCli -LDInfo -LALL -aALL
    This lists the logical drives for all LSI Logic adapters found in your system. The "State" should be set to "optimal" in order to have a fully operational array.

  • /opt/MegaRAID/MegaCli/MegaCli -PDList -aALL
    This lists all the physical drives for the adapters in your system; the "Firmware state" indicates whether the drive is online or not.
The next step is to automate the analysis of the drive status and to alert when things go bad. To do this, I added an hourly cron job that lists the physical drives and then analyzes the output of the MegaCLI command.
  • I created a file called "analysis.awk" in the /opt/MegaRAID/MegaCLI directory with the following contents:

    # This is a little AWK program that interprets MegaCLI output

    /Device Id/ { counter += 1; device[counter] = $3 }
    /Firmware state/ { state_drive[counter] = $3 }
    /Inquiry/ { name_drive[counter] = $3 " " $4 " " $5 " " $6 }
    END {
    for (i=1; i<=counter; i+=1) printf ( "Device %02d (%s) status is: %s <br/>\n", device[i], name_drive[i], state_drive[i]); }

    This awk program processes the output of MegaCli, as you can test by running the following command:

    ./MegaCli -PDList -aALL | awk -f analysis.awk

    when being in the /opt/MegaRAID/MegaCLI directory.

  • Then I created the cron job by placing a file called raidstatus in /etc/cron.hourly, with the following contents:

    #!/bin/sh

    /opt/MegaRAID/MegaCli/MegaCli -PdList -aALL| awk -f /opt/MegaRAID/MegaCli/analysis.awk >/tmp/megarc.raidstatus

    if grep -qEv "*: Online" /tmp/megarc.raidstatus
    then
    /usr/local/bin/smtp_send.pl -t tim@pretnet.local -s "Warning: RAID status no longer optimal" -f esx@pretnet.local -m "`cat /tmp/megarc.raidstatus`" -r exchange.pretnet.local
    fi

    rm -f /tmp/megarc.raidstatus
    exit 0

    Don't forget to run a chmod a+x /etc/cron.hourly/raidstatus in order to make the file executable by all users.
In order to send an e-mail when things go wrong, I used the SMTP_Send Perl script smtp_send.pl that was discussed by Duncan Epping on his blog.

9 comments:

Toni Verbeiren said...

Interesting info! Thanks and keep up!

Anonymous said...

Shorter version of the second script:

/opt/MegaRAID/MegaCli/MegaCli -PdList -aALL | awk -f /opt/MegaRAID/MegaCli/analysis.awk | grep -qEv "*: Online" > /dev/null && echo "Warning: RAID status no longer optimal"

And cron will send you an email in case of error

Anonymous said...

Thankyou Muchly
This workd very well for me and my Cluster. I Modified it slightly that if the Online statement failed then it outputted the entire MegaCli -PdList -aALL to a new file and attached this to the email.

Anonymous said...

Thanks this was very useful. I had to modify slightly as the status could also be "Hotspare".

Axel Werner said...

WEll.. thanks ... but... Whats in your AWK Script ?? where u posted THAT content ? i guess it would not work "out of the box" yet without that AWK file. Right ?

GreyCat said...

Hi! I thought you might be interested in "analysis.awk on steroids" - Einarc - a wrapper tool for all these CLIs. You can check it out at http://www.inquisitor.ru/doc/einarc/

The original script transforms to

einarc logical list | grep -v 'normal$' && echo 'Warning...'

John Puskar said...

This is excellent! What would the awk script look like if also allowing for a "Hotspare"? Thanks again.

Levina Gill said...

Its really nice stuff..
Thanks for sharing Interesting information about ESX Monitoring Tools and also thanks to all for given helpful reply.
Great work, Keep it up !!!





Amma Rany said...

Great information, I really like all your post. I will keep visiting this blog very often. It’s good to visit your website.