So, a funny thing happened on one of my customers. Funny in the sense that one would expect this behaviour if you sit down and think about it. But really, who has time to think, right?
You see, we use ConfigMgr (SCCM) as a standard to deploy OpsMgr agents automatically. This typically ensures that every server in the environment is covered.
One of our new SCCM admins configured a package on a customer that recently migrated from MOM to SCOM, and, somehow, included the OpsMgr servers, which means that the OpsMgr agent was pushed out to both the RMS and the MS in that environment. Needless to say, OpsMgr became completely non-functional, as the agent configuration overwrote the MS healthservice config.
Luckily customer had an RMS and an MS. Unfortunately, the agent was deployed to both, which made recovery a little tricky.
Now, to recover, I’ve done the following:
- Promoted the MS to RMS. Easy peasy. Bugger, MS reports the same error. Oh, right, agent was also deployed to MS.
- Uninstalled OpsMgr from the RMS. Ok so far. No, not really, because it failed.
- Stress, turn up the music. Think.
- Oh, look, uninstallation succeeded after a reboot.
- Rebooted again, just for good measure.
- Attempted to install OpsMgr, but UAC was turned on, which returns an error. Ok, disable UAC, reboot again.
- Installed OpsMgr, reconnecting back to the database. OMG, it worked.
- Promoted newly installed RMS back to RMS role. Ok, right. No errors this time, and it reports healthy in the console.
- Run ManagementServerConfigTool.exe UpdateDemotedRMS on the MS that was promoted to RMS. Still ok. Just checked the console to make sure, and everything is still fine.
- Vodka! (or it would be, if I could consume alcohol)
- Uninstalled OpsMgr, failed. Oh, wait. Uninstall agent first, then uninstall SCOM. Rebooted for good measure.
- Switched off UAC (being pro-active here), and reinstalled OpsMgr and Voila! Environment returned back to how it should be.
Don’t try this at home. Please