Wednesday, August 21, 2013

IPMI over LAN vulnerability and some BMC "features"

I don't want to pull away credit or page views from Dan Farmer's great work, but this needs more exposure...
For those of you who manage servers with IPMI over LAN enabled, there is a very severe vulnerability that may allow anyone full root access to your iLO/iDRAC/IMM/ILOM/whatever (aka BMC).  This is independent of the OS, though once rooted the attacker can then take over the OS in the same way they would as if they have physical access.  They can control power, boot settings, serial over LAN, BIOS settings (via serial), KVM, and can even read/write arbitrary system memory.

For those of you who do not have IPMI over LAN enabled, there may be some stuff that affects you too...

Wednesday, July 24, 2013

Server Room and Three Phase Power for Systems Administrators

There doesn't seem to be much educational material about server room power that is comprehensible to systems administrators.  I don't think there is a "typical" sysadmin type out there but I'm guessing that most have had little to no formal training about server room power.  Three phase power may seem like black magic and lots of incorrect assumptions are made, thus I decided to write this post.  Hopefully this will be useful to some sysadmins out there.

Tuesday, July 16, 2013

Per-user /tmp and /dev/shm directories

Updated Oct 7, 2013: Tons of updates
Updated: March 19, 2014: The recommended configuration has been in production for months now and works great

I recently discovered a great feature in Linux that allows for per-process namespaces (aka polyinstantiation).  Different processes on the same machine can have different views of a filesystem, such as where /tmp and /dev/shm are.  You can easily make it so that each user on a shared system has a different /tmp that, to each of them, really looks like (and is) /tmp.  This isn't done by setting an environment variable; this redefines mount points on a per-process basis such that each users' processes are using their own directory as /tmp.

Wednesday, July 10, 2013

Installing a Xeon Phi (MIC) Card in a Dell PowerEdge R720

We got an early release of Dell's Phi installation kit with installation instructions that weren't all that great (to say the least). Dell told me that they are working on better instructions.  In case you're confused, here you go.

A few things to note:
  • We have dual 95W CPUs.  These instructions might be different (correct?) for higher wattage CPUs (larger heat sinks, different plastic baffles?)
  • The extra heat sinks are for the CPUs, not the Phi.  Our 95W CPUs did not need them.
  • The 2.5" and 3.5" mounting brackets are not necessary in our configuration
  • We used a different bracket that was provided
Here are some pictures of what it should look like:

Tuesday, May 14, 2013

RHEL 6.2 - Linux Kernel Problem?

We experienced several problems when we upgraded to Red Hat Enterprise Linux 6.2 from CentOS 5.4.  A user of ours started reporting slowness on some of his larger HPC jobs.  We looked at tons of things then started noticing that one or more nodes would start swapping for no reason.  His job would only use about 60-70% of the memory on each node but some nodes would inexplicably swap (diagnosed with vmstat).  I talked to people at other universities and HPC sites and verified that a similar problem was occurring on their RHEL 6.2 installations.

Wednesday, May 30, 2012

BMC: Enable SNMP Traps for Hardware Failures


You can enable SNMP traps to be sent from a BMC/iDRAC/Whatever using a few IPMI commands.  This works on every Dell server I have ever tested but didn't work on HP systems I have tried.  I imagine it would work on servers from most other vendors.  I know there are vendor-specific tools to do this, but I prefer using industry standard protocols to administer systems and really don't like having to install a lot of extra vendor tools and daemons.

Thursday, May 10, 2012

Enable IPMI Over LAN from the OS using FreeIPMI

Maybe you just heard about how wonderful it is to control your hardware remotely.  Maybe you forgot to configure IPMI Over LAN on a production system's BMC and you don't want to reboot.  Fear not!  Enabling IPMI Over LAN can (usually) be done from the OS using freeipmi.

Wednesday, May 9, 2012

freeipmi *-config tools primer

Basic usage of bmc-config, pef-config, and other freeipmi *-config tools

The documentation below should work for all freeipmi *-config commands.  It only discusses how to connect to a BMC (including iDRAC, iLO, etc) then read from and modify its configuration.  I picked bmc-config to demonstrate program usage, though the same parameters will work for all the related commands.

In some places I reference the "-f" parameter which allows you to specify a filename.  In newer versions it appears to still work but have been deprecated by "-n".

Wednesday, November 2, 2011

BMC: Change Temperature Thresholds

This post shows how to update a server's Baseboard Management Controller (or iDRAC or maybe an iLO or something else) to power the server off at a different temperature threshold than the manufacturer default.  This is done using ipmitool and freeipmi commands.  We use it to lower the set points for some of our servers in a less-capable room that we have.  The servers will then do a hard shutdown if the thermal threshold we set is reached.

Friday, March 4, 2011

Performance Problems Resolved

The performance problems that we were having have been resolved.  Without making you read this whole thing to get to the conclusion:
  • The problem was solved with an iDRAC firmware update provided by Dell (contact Dell to get the right version)
  • The odds that you, the reader, are affected are extremely low (unless you have a PowerEdge M610 with dual Intel Xeon 5650 Westmere processors, six 4GB quad-ranked 1066 MHz DDR3 RDIMMs, no mezzanine cards, and only one 10K SAS hard disk. Even then, there may be another mitigating factor that ensures you aren't affected.)
  • If you think you are affected, Dell support should be able to quickly tell you if that is the case or not
  • We consider the issue to be resolved