Wednesday, December 29, 2010

Useful commands for Dell servers

Here are some commands that may come in handy for Dell systems.  I don't use OpenManage for anything because we had trouble with it a few years ago.  After figuring out how to do the same things with lower-level commands, we never bothered with it again.  The issues were due to installation/configuration problems and the occasional instance of a daemon chewing up CPU.  Looking back I would now guess it was due to kipmi0 going out of control (can sometimes be fixed by a reset of the iDRAC/BMC or a virtual reseat).

If you know of any other useful or obscure commands, please add a comment at the bottom.  At some point I'm going to write about cross-platform IPMI commands.

Diagnostic log-gathering commands for an M1000e enclosure
  • dumplogs - Hidden command. Useful but VERY verbose logs
  • racdump - Less verbose than dumplogs
  • getsel - Grab the SEL (System Event Log) from the chassis
  • getraclog - Similar to getting syslog output
  • clrsel / clrraclog -  Clear SEL or raclog. They do fill up

To capture the output of these commands, do something like:
ssh mym1000ehostname dumplogs > /tmp/thelogfile

Current M1000e status
  • getmodinfo - Health, presence, service tags of chassis, blades, switches, etc.
  • getmacaddress - Grab MAC addresses for every ethernet interface on the blades and chassis
  • getsensorinfo - Fan speeds, chassis ambient temperature, power supply status
  • getpbinfo / getpminfo - Power status
  • getversion - Get blade iDRAC versions and blade types
  • getversion -c - Get blade CPLD versions
  • getversion -b - Get blade BIOS versions

(Just type "?" for more information and look at the get commands.  These are the ones I use frequently or ones that are very useful yet may not be well-known)

Powerful M1000e commands
  • racreset -m server-12 - Reset the iDRAC on blade 12 from the CMC (safe for the OS).  If the iDRAC is still alive, it can also be done with ipmitool -H remote-idrac.example.com -I lan mc reset cold
  • serveraction -m server-3 powerdown - Hard power off server 3
  • ? serveraction - Get list of other actions to control power to blades
  • serveraction -m server-4 -f reseat - Virtual reseat a blade. Not safe to do with the OS up since it's the same as suddenly pulling the power plug.  This fixes many problems.
  • serveraction -a powerup - Power up all blades. (Power up is staggered by the CMC)
  • racreset - Reboot the CMC. Safe operation for the blades. Wear ear plugs.
  • chassisaction -m switch-2 reset -  reset switch 2 from the CMC command line.

Change BIOS settings on Dell servers
The Dell Deployment Toolkit can do all sorts of fun things.  The syscfg command queries and changes BIOS settings such as C-states, C1E, turbo mode, virtualization, hyperthreading/SMT, power loss/recovery boot action, boot order, USB port accessibility, memory node interleaving, and much more.

For desktops and laptops, try Dell's Client Configuration Toolkit (CCTK) which works in Windows and Linux.

There is an unfortunate bug in Dell's software depending on how you install it.  For us, running dset changes /etc/omreg.cfg to something that then breaks syscfg.  Keep a backup copy and change it back after running dset.  I'm sure we can change the installation paths to make it work... Maybe one of these days.

A few examples (changes usually require a reboot):
  • syscfg -h - List all the options for syscfg
  • syscfg -h --somecommandname - Description of a command
  • syscfg --cstates=enable - Enable C-states
  • syscfg --virtualization - Query current virtualization setting
  • syscfg --bootseq=usbfloppy.slot.1,usbcdrom.slot.1,nic.emb.1,hdd.emb.0 - Change bootorder
Any other useful commands (besides OMSA, etc)?

Disclaimer:  I disclaim all responsibilities for you acting upon anything you read on some random guy's website (my website, to be exact).  Everything is correct according to my experience and understanding of Dell's documentation but don't blame me if a known-safe command isn't so safe because of a firmware bug or something.

3 comments:

  1. Ryan - Thanks for this, but you dropped off the explanation of the last command, "chassisaction -m switch-2 reset - reset switch 2 from the CMC command line. This fixes most transient problems but is...." What comes after the "but is?"

    ReplyDelete
  2. Brian,
    I really can't remember what I was going to say there. It was probably something simple like mentioning that networking going through that switch will go down temporarily, as you would expect when rebooting a switch. I can't think of any other side effects so I will just delete that sentence. Thanks for the question.

    ReplyDelete

Please leave any comments, questions, or suggestions below. If you find a better approach than what I have documented in my posts, please list that as well. I also enjoy hearing when my posts are beneficial to others.