Sometimes one of the most useful resources at your disposal when troubleshooting a hang or other issues is the memory dump file Windows will write out during a blue screen. If a system is hung and you are not able to get to it locally, pressing Ctrl+ScrollLock, ScrollLock isn't going to be a feasible solution. If the server is an HP server with an iLO card (Integrated Lights Out), and you've set a registry key in Windows ahead of time, you can force the system to bluescreen, write the memory dump, and restart.

The key to doing this is generating what's called a nonmaskable interrupt or NMI. The long and short of it is that NMIs are hardware interrupts which have to be serviced immediately. Windows has a concept of IRQ levels, or IRQLs. The highest IRQL is always serviced, preempting any lower level interrupts which are currently being serviced. The preemptive behavior here is called masking the interrupt. So, an NMI is an interrupt which must be serviced immediately. Generally you get an NMI when there's a major hardware fault that prevents the operating system from continuing. This is exactly what happens if we trigger one manually in the iLO.

This post covers working with an iLO1. If you’re using an iLO2, visit http://briandesmond.com/blog/forcing-a-blue-screen-via-ilo-ilo2-version/.

The first step to getting this functionality working is setting a registry key outlined in KB 927069. Don't mind the part about this only applying to Windows 2000 Server. This works on 2000 and newer. Here's the registry key info:

Path: HKLM\System\CurrentControlSet\Control\CrashControl
Value: NMICrashDump
Data: 1
Type: REG_DWORD

You’ll need to reboot for the change to take effect. If you don’t reboot after making this change, you’ll see the effect illustrated here.

Note: If you have the Automated System Recovery (ASR) functionality enabled on the server and you need to get a full memory dump, you will need to turn it off as it can interfere with this process. This is a BIOS setting which I don't have the steps to change easily available. If there's demand (leave a comment), I can track them down.

To crash the box, these are the steps. I shot these screens on a DL360 G4 which is fairly recent hardware. I suspect the screens and locations of options may vary a bit by age (and especially on older legacy Compaq stuff), but the basic process is the same.

1. Login to the ILO and then proceed to the "Server and iLO Diagnostics" link on the left hand navigation:

    

2. Select the Virtual NMI Button option on the toolbar:

    

Warning! I can’t guarantee that this button generates a warning when you click it on all versions of the iLO firmware. Generating an NMI will HALT your system. Don’t click this button just to see what happens.

3. Generate the NMI. This button is towards the bottom of the page so if your browser doesn't automatically scroll down to it, you'll have to drill down:

    

4. You will get a warning dialog to make sure you're really certain this is what you want to happen. Remember, doing this will HALT your system!

    

5. The iLO will write a status message to the status bar in IE:

    

6. At this point Windows will crash with a 0x80 bugcheck and reboot (assuming your machine is configured to automatically reboot after a bluescreen). You can hopefully use the memory dump to assist in troubleshooting the problem at hand.

Note: If you don’t get a traditional blue screen and instead get the nontraditional blue screen pictured here, you’ve made an error entering the registry setting described earlier or you did not reboot.

 

Note: This capability is present in the Dell DRAC3 cards. I spoke with Dell and they advised me that this functionality is not available in newer generation DRAC cards.