You know, I love a challenge, but the whole idea of a challenge is that there should be the possibility of success at the end of our efforts.
The hardware at the place where I work is *old* .. seriously old. I'm talking Windows 95 with IE3... boxes with 16 and 32 meg of RAM. Because we use Citrix and published applications older hardware was ok - the server supplied all the oomph ... but now the desktop hardware is hitting end of life.
When desktops fall over its an easy fix; reformat and reinstall to keep things going, or buy a new box... Easy to set up compared to a server.
The past few days have been more serious battles.. I have a well specced up SBS2003 box which has issues, but is running, and I have two terminal servers - Windows 2000 boxes with a Gig of RAM in each; both running RAID1's (but only two installed drives in each) and hosting Citrix Program Neighbourhood applications.
I swear, the 2000 boxes are dying. 48 hours ago I got a call from my boss - one of the servers is screaming (bloody Adaptec alarm settings).
<point of interest> One of the 2000 servers is a beeper, one is a screamer.. easy to tell the difference between the two, assuming you don't go deaf first </point of interest>
My primary contact at my new IT company is online, despite it being 11pm his time... I ping him and we get to work. Hmmm, methinks those potted irises were *not* enough of a present for him and his family. Note to self: send Cointreau - MG loves the stuff ;o)
Remember: the following occurred over roughly 48 hours.
The mirror drive has apparently failed. So we get a new drive, but it won't install. Its not detected in Bay 4, 3, 2 or 1. Then Bays 3, 2 and 1 start showing yellow drive failure lights *when the bays are empty*. Adaptac Storage Manager is showing a failed drive in ID5 .... um, that's a bit of a problem... there is no such thing as an ID5 in my RAID array (I have an ID0, 1, 2, 3 and 4).
Then it is spotted that our two 36 gig drives are, according to Storage Manager, of different sizes, and that the new mirror drive is apparently smaller than ID0 by about 60Meg, therefore the RAID doesn't accept it as a mirror. Shyte, ok, what do we do about this. We decide to boot up using Win98, use Ghost to image ID0, copy image to ID4, move ID4 to ID0, put the old ID0 into ID4 and make it the mirror. No more problem. New drive is smaller than big.. problem solved... Ya think???
<note to self> Rant about hard drive sizes and inconsistences in how they are assessed will make a very good column when I've calmed down </note to self>
I get everybody off the network and shut down all servers (hey, I'm pedantic). Stick the startup disk into the floppy and power up... damn, A drive is not part of the boot sequence. "No Operating System" error... seems somebody left the Adobe install disk in the CD drive which *is* part of the boot sequence.
Ok, remove Adobe disk, power off, power on...... gets as far as "hit Ctrl A to enter Adaptec setup" and things goes no further.... awww hell. Leave it for 10 minutes more, 15 minutes.... no joy... ok, ctrl A it is.... the Adaptec utility is seeing the drive... so why the hell ain't it booting?
Power off, reboot, hold down Del key to get into motherboard BIOS to add floppy to boot sequence.... hmmm, seems our motherboard doesn't actually have a BIOS program - if there is one, its hiding from me .... let's try F2... no joy... by this time I am on the phone to our new IT guy (the poor guy is trying to get his Christmas shopping done - we're doing all this by mobile phone), I've slashed a finger big time on the edge of a RAID bay, am suffering some serious blood loss and am making a mess of the floor.... I asked the Office Manager to grab me a tissue to wrap the hand because I was on the phone to my new IT guy... I got a "look".... (nothing compared to a Flamethrower Queen look, but still a look) ;o) OK, grab tissues... that's under control. I'll mop up the blood later.
Power off. Remove new drive in ID4 and reboot ... no joy. Remove drive from ID0 and put it back in making damned sure its well seated (again) (trying really hard not to get TOO much blood everywhere).. no joy.
This is a serious issue; the two terminal servers, on their own, cannot support all employees... the demand on RAM and CPU is simply too much... so I've shared the user load between the two boxes. If I can't get this box running, with less than three working days until the Christmas break, I have got serious problems.
Ok, I have two choices... ship the box to my new IT support who are in another State (hey, they're the best of the best and I only use the best) or make a very humble phone call to IT company who have just been given notice.... the old guys, without hesitation, agree to come in. Very cool :o)
I leave the server as is.. hung at the "press ctrl a to enter adaptec setup" (it should have continued to boot after a few seconds if that command sequence is not used) and go to find a tournique for my hand.
The IT guy from the old company walks in within 20 minutes ... to his credit he's a consummate professional... "so what's happened?" says he.
Fast forward an hour.... the server is now booting via ID0.... but it ain't happy. Adaptac Storage Manager shows ID0... ID4 as 'ready' and ID5 as failed.... you may remember, we don't have an ID5!!! ID4 will NOT join the array... nohow, noway.... it tells us to take a long walk off a short pier (if anybody happens to know the meaning behind error code RC:385, feel free to drop me a line). We can't find any way to remove ID5 without breaking the array.
Y'know what bugs me? (and this is where the shattered confidence comes into play)... the IT guy that came in from the old company didn't actually *change* anything to get the server to boot; he pulled out ID0, blew on the contacts, and put it back in, and it booted... BUT I DID THAT DAMN IT.... well, I didn't blow on the contacts to get the dust off...
I'll be honest, if this guy from the old IT company had been around for the past few months (he's been promoted and is only on site because they're short of staff during the Christmas break) we may have kept our old IT company on. He listens, he talks, he knows the history of our hardware, and he knows his stuff, and when he doesn't know his stuff he knows how to research, and ***he has an eye for detail***. And, damn it, he is willing to sit down and talk through issues with a female (me) who, although she is not server skilled, still has a bit of an idea about what is going and can think outside the box. The other guys... when I said to them, for example, "the Metaframe XP toolbar is broken on Server 2 and the buttons have generic Windows icons, please fix it" (yes, I could fix it myself but, damn it, we have an authority stream here - its *their* job to fix such things and its *my* job to manage them) they said to me "that doesn't matter, use the Program menu instead". When I said to them that I wanted the old fashioned administrative utilities available (because I hate the slow loading Citrix Management Console), they would say to me... "you don't need that, use the Citrix Management Console". AAARRRGGGHHH!!!! I *hate* Citrix Management Console.. for example, it takes minutes to load whereas Terminal Services Manager takes seconds. This is a big issue when we are having ADSL and network problems and I am constantly having to release orphaned terminal services sessions.
I have chatted to the gentleman from our old IT company over the past few days... asked him if he misses being on site.. he reckons he hates computers and is much happier in an office.... but, damn it, he's got a fantastic troubleshooting brain. What a waste.
Anyway, back to the server.... all I can say is that it is dying... everything has been replaced except for the drive bays themselves, and if the old IT company guy is right in his theory that the backplane is f**ked, then realistically replacing the bays is the only thing that will fix the problem. My challenge now is to keep things going for 3 months or so until we can:
1) get all new desktops, dump Citrix and go local, or
2) buy all new desktops AND a new server to host Citrix.
I'm not sure it is possible to keep things going :o( Oh, and that finger slash and all that blood... nasty infection developing :o(