Node health (hardware/software) is very critical in cluster environment. Whether you run a lot of web servers in load balancer or having compute node in High Performance Computing setup, each node is critical which is been used for any type of work. It has to be healthy to run processes we need to run. This script created to check node health before using as a compute node or any service in load balancer. If all checks run fine then node will be mark good and will available to run in production.
1. License
GPL. You can add/edit/delete script to fit your need. Please do share your edit to enhance the script.
2. Functions
Script is created using functions. Every check is based on function, so to perform any check or not is depends on whether you call the function or not. Just make to call function or not in the Main Section of the script and it will run as per your need. Even you can add/edit/delete as per your need to make changes in script.
3. Checks
Following checks are made as per my need.
- Loadavg
- Memory / Swap
- CPU
- Ethernet Speed
- Infiniband
- NFS Mounts
- NIS Server
- MCELog (Machine Check Event, hardware error reporting)
You can create your own checks and add to the script for enhance checking.
4. Script
Here it is
#!/bin/bash # # Script: Node Health Script # Created By: Sohail Riaz (sohaileo@gmail.com) www.sohailriaz.com # Created On: 9th April 2013 # Detail: To Check Single Node Health. # HowTo Run: chmod +x nodecheck.sh; ./nodecheck.sh # Checks: loadavg, memory, cpu, ehternet, infiniband, infiniband ipath test, cpu test, nfs mounts, nis, mcelog # Functions: You can load/unload any function you need to run or not to run. # License: GPL # ## LoadAvg Function loadavg() { LoadAvg=`uptime | awk -F "load average:" '{print $2}' | cut -f 1 -d,` echo "LoadAvg = $LoadAvg" >> $Report } ## Memory Function memory() { TOTAL_MEM=`grep "MemTotal:" /proc/meminfo | awk '{msum+=($2/1024)/1024} END {printf "%.0f",msum}'` FREE_MEM=`grep "MemFree:" /proc/meminfo | awk '{mfree+=($2/1024)/1024} END {printf "%.0f",mfree}'` TOTAL_SWAP=`grep "SwapTotal:" /proc/meminfo | awk '{ssum+=($2/1024)/1024} END {printf "%.0f",ssum}'` FREE_SWAP=`grep "SwapFree:" /proc/meminfo | awk '{sfree+=($2/1024)/1024} END {printf "%.0f",sfree}'` echo "TotalMemory = $TOTAL_MEM GB ($FREE_MEM GB Free)" >> $Report echo "TotalSwap = $TOTAL_SWAP GB ($FREE_SWAP GB Free)" >> $Report } ## CPU Function cpu() { PROCESSOR=`grep processor /proc/cpuinfo | wc -l` CPU_MODEL=`grep "model name" /proc/cpuinfo | head -n 1 | awk '{print $7 " " $8 " " $9}'` echo "Processors = $PROCESSOR" >> $Report echo "ProcessorModel = $CPU_MODEL" >> $Report } ## Ethernet Function (eth1 for me, you can edit for yours) ethernet() { ETHER_SPEED=`ethtool eth1 | grep "Speed:" | awk '{print $2}'` echo "EthernetSpeed = $ETHER_SPEED" >> $Report } ## IB Function ib() { IB_STATE=`cat /sys/class/infiniband/*/ports/1/state | awk -F ":" '{print $2}'` IB_PHYS_STATE=`cat /sys/class/infiniband/*/ports/1/phys_state | awk -F ":" '{print $2}'` IB_RATE=`cat /sys/class/infiniband/*/ports/1/rate` echo "IB_STATE = $IB_STATE" >> $Report echo "IBLink = $IB_PHYS_STATE" >> $Report echo "IBRate = $IB_RATE" >> $Report } ## IB Test Function ibtest() { IB_TEST=`ipath_pkt_test -B | awk -F ":" '{print $2}'` echo "IPathTest = $IB_TEST" >> $Report } ## NFS Mounts Function nfs() { NFS_MOUNTS=`mount -t nfs,panfs,gpfs | wc -l` echo "NFS_MOUNTS = $NFS_MOUNTS" >> $Report } ## NIS Function nis() { NIS_TEST=`ypwhich` echo "NIS_SERVER = $NIS_TEST" >> $Report } ## MCELog Test Function mcelog() { MCELog=`if [ -s /var/log/mcelog ]; then echo "Check MCELog"; else echo "No MCELog"; fi` echo "MCE Log = $MCELog" >> $Report } ### MAIN SCRIPT ## Get Node Name Hostname=`hostname -s` touch ./$Hostname-checks.txt Report=./$Hostname-checks.txt echo " " > $Report echo "Node = ${Hostname}" >> $Report echo "----------------" >> $Report ## Get Cluster Name Cluster=`echo $Hostname | cut -c1-4` ## Call Function loadavg memory cpu ethernet ib ibtest nfs nis mcelog ## Generate Report echo " " >> $Report cat $Report
or press to download. [button link=”http://www.sohailriaz.com/downloads/nodecheck.sh” bg_color=”orange”]Download Script[/button]
To run the script run following command.
chmod +x nodecheck.sh ./nodecheck.sh
5. Enhancement
You can add/edit/delete any function inside script to meet your need but do share your edits to let us improve the script for maximum checks.