Stupid user tricks: eleven IT horror stories

13/06/2006 12:29:53

They're short, tall, skinny, and fat. They're smart or stupid, unique or cloned -- but no matter what, they'll abuse technology.

In deference to my years of dealing with this most dangerous species of wildlife, the editors at InfoWorld asked me to record some of my most memorable experiences along with tips on how to avoid similar incidents. Being both thorough and lazy, I decided to open the floor to our adventurous readers as well, who have been kind enough to relate their tales of sorrow and solution.

The result is a list of problem categories each with a specific situation and solution. Broad advice applicable to all IT adventurers can be found in the moral bringing up the rear. With luck, this salutary information will help keep your rear covered.

Automatic updates BrilliantCompany.com was growing at dotcom bubble rates. With departments popping up like daisies in spring, the IT staff was ceding desktop control to department heads because most everyone was technical anyway. Shortly after a batch of 75 new Dell desktops arrived to populate a new product division, the network suddenly died in the middle of the day. All lights were green in infrastructure land, but performance had slowed to such a crawl that the LAN was effectively paralysed.

Some diligent sniffing and log file snooping revealed the culprit. Turns out Windows XP's Automatic Update had defaulted to high noon on a weekday, and all 75 machines attempted to download several hundred megs of Service Pack 2 simultaneously and individually. Instant network clog.

Solution: Centralise IT control so one somebody can be responsible for all the details. This was done in short order after I released a sprightly memo to the appropriate folks. Then, I did what I should have done earlier and set up SUS (Software Update Services), now WSUS (Windows Server Update Services), to download updates and distribute at an appropriate time and after appropriate testing against departmental OS images.

Moral: Just because your users are technical doesn't mean they'll behave with any more attention to detail than the average Joe. If network uptime is your responsibility, then take responsibility and manage what needs managing.

Client protection Reader S Enright relates a tearful tale: A mobile user called to say that his laptop was no longer functioning. After a lengthy phone conversation, during which the user initially denied anything unusual had happened, he disclosed that he had spilled an entire can of Coke on the keyboard.

"He continued by telling me that he had tried to dry it with a hair dryer, but that it still would not boot. I asked him to send it back to me, and that I would have it repaired.

"But when Enright opened the laptop's shipping box the very next day, he had a bit of a shock: The gentleman had not used a hair dryer, but must have borrowed a heat gun at one of our locations, because all that was left of the keyboard was a cooled pool of molten black plastic." Ouch.

Solution: The laptop was insured for "accidental" damage only. Since the incident, maintaining full coverage of mobile equipment has been a matter of course for Enright.

Moral: Cover your mobile warriors. That means not only insuring their hardware, but giving them training and clear policy documents on what can and can't be done with company hardware on the road. Further, make sure their data is backed up religiously, both when they're at the home office and when they're on the road.

Executive clout Here, we're concerned with that senior executive who just has to have full administrative rights to every machine on the network. Even though he's about as technical as my cat--and my cat is dead. Senior users can be dangers even without special access rights. John Schoonover, who worked for the Department of Defence on one of the largest network deployments in history during Operation Enduring Freedom, was "witness to a huge lack of IQ points" in a senior manager.

According to Schoonover, military infosec installations generally follow a concept termed "the separation of red and black". Red is simply data that has not been encrypted yet. (Danger, the world and sniffers can see you!) Black is the same data after it has been encrypted and is now ready to traverse the world.

"These areas [red and black] are required to be separated by a two-metre physical gap," Schoonover says. Our hero proceeds to follow these guidelines and deploys the network, but comes back from lunch one day to find the firewall down. Investigation shows that a senior manager "had taken the cabling from the inside router and connected to the Internet for connectivity, thus bypassing all firewall services, encryption, and -- oh yeah, that's right -- the entire secure network with a jump straight to the Internet".

Solution: John says they "removed the culprit's thumbs, because if you can't grip the cable, you can't unplug it". I didn't ask for any more details.

Moral: Managing rogue senior users is an art in itself that requires diplomacy and even outright deception. In several installations I've renamed the Administration account something like "IT" and made "Administrator" a functionally limited account with simply more read/write access to data directories, while still blocking access to things like the Windows system directory or Unix root directories. Most times they never notice; and if they do, I'm pretty good at making up excuses why those directories remain closed off. ("Oh, that's something Microsoft did in the last service pack. Gosh darn that Bill Gates.")

Legal eagles hunting IT mice Lawyers ruin everything -- including smoothly running networks. But IT managers who ignore the ever-changing legal landscape's impact on technology do so at their peril. I was once called in as referee among in-house council, senior management and IT staff after the company was informed that child pornography had been tracked to its servers.

The company didn't know whether to aid the investigation by figuring out which employee was responsible or to just delete all the offending files immediately and most likely incur a fine but protect the firm from getting shut down. In the end, the lawyers managed to make a deal with investigators. The company's IT network stayed active and we tracked the lowlife down and had him arrested. Quietly.

Solution: Talk to senior management and corporate council about legal issues, such as corporate response to third-party audits or company responsibility for data it's holding concerning third parties, before they happen. This discussion goes beyond IT-centric solutions. Management must decide whether it wants to retain all pertinent data (the best course of action for those third-party audits) or automatically delete offending data (such as whatever's found in porn filters).

IT and management must see eye to eye on how the company will respond to law enforcement inquiries, investigations, or even raids. If Homeland Security agents believe a terrorist is masquerading as an employee and storing data on corporate servers, they can come in and pretty much take anything they want. That could put a real crimp in the style of, say, an e-business.

Developing the best course of action should involve senior management, corporate council and law enforcement.

Moral: The higher you are on the IT food chain, the more such liability can spell serious trouble. If you make sure to discuss at least general legal eventualities with senior management, you're much more likely to do yourself and your employer some real service in specific situations. If they refuse to discuss the matter, archive everything you can.

Disasters in disaster recovery Gary Crispens reports an incident he encountered after questioning an IT director about the company's preparedness for disaster recovery. The director responded huffily that the hot site was ready for any disaster, including the necessary space and equipment all backed by a diesel-powered generator with "plenty of fuel".

After about a year, the company had a hurricane-related power outage that forced it to roll over to the hot site. "Sure enough, the IT Director had critical functions up and running and I could hear that generator running out back. But after about eight hours the power went out for good and all systems crashed when the generator stopped." It turned out that "plenty of fuel" was one 200 litre drum that was already half empty from the monthly testing.

Solution: A disaster recovery plan that called for fuel checks in addition to generator testing.

Moral: Disaster recovery isn't a static issue. One plan or one policy is never perfect out of the gate. Ever. Pass such concepts by as many experienced eyes as you can and then revisit them annually or even bi-annually for refinement.

Rogue peripherals CompUSA and the Dummies books are teaching users just enough of the tech alphabet to spell trouble. One of my favourite stories was the network that was severely hacked by someone who came in from the outside and deleted the main Exchange message store. Firewall logs had got the local IT admin nowhere, so we were called in to do a little snooping around. I wish I'd thought of it, but another guy on the team had the sense to run AirSnort. He found a wide open Linksys wireless access point in about six seconds.

The internal admin insisted there was no wireless running anywhere on the network. It took some sneaker netting, but we found the rogue AP in a senior exec's office about 20 minutes later. Seemed he saw how cheap they were at the local CompUSA and decided to plug one into the secondary network port in his office so he could use his notebook's wireless instead of the wired connection because no wires "looks better".

Another problem in this vein is USB. Being able to plug in a peripheral and achieve working status without the need to install drivers has rapidly spread the popularity of personal peripherals. You don't want to let yourself get sucked into supporting things such as printers that aren't on your official purchase list -- or external hard disks, DVD drives, sound systems and even monitors.

Nor do you want the security risk of an employee plugging in a gig or two of empty space into any workstation's USB port and copying important corporate information. Source code, accounting data and historical records all can be copied quickly and then walk out in somebody's hip pocket.

Solution: Let employees know what is and isn't acceptable as corporate peripherals. Keep an accurate asset record of what belongs to the IT department so you can more easily find or ignore the stuff that doesn't. And if data theft is a problem, think about protecting yourself by disabling USB drives, uninstalling CD-RW drives, or similar measures. The work you do now can save your bacon later.

Moral: Asset management isn't just for the anal. Knowing exactly what's supposed to be on your network is a key step to solving a wide variety of IT mysteries.

Security silliness Security should be everyone's job, from CTO to administrative assistant. It's surprising how few organisations recognise this. I think back to a time right after a fairly large network upgrade. All weekend, day and night, had been spent migrating a nightmare network from a hodgepodge of Windows 95/98/ME and even OS/2 clients with NetWare and Windows NT servers to a clean, homogeneous utopia of redundant Windows 2000 Servers on the back and Windows XP Professional desktops on the front.

Things hadn't gone quite as smoothly as we'd hoped, so instead of finishing up on Sunday afternoon, we were still putting final tweaks in place on Monday morning.

After we did our last test (making sure all local tape backups were working properly) it was about noon. (Most users by now had logged in, been informed that they needed to choose a new password in accordance with our medium-strong password guidelines, and had chosen a new password.)

I stumbled bleary-eyed into the lunchroom for my umpteenth caffeine fix. Chugging my Coke, I almost missed it while mincing out of the lunchroom. But it grabbed my attention from the corner of my eye and caused Coca-Cola to shoot from my schnoz like some enraged soda dragon: "Password List."

Yes, every user's new password along with IT and even some specific switch passwords had been printed out by a well-meaning secretary and posted in the lunchroom. After they pried my hands from her throat, she explained that she just figured it'd be easier to post them there than to answer all the phone calls when users inevitably forgot them. So she went around and collected them (in my name), built her list, and posted it.

Solution: User training. Passwords should not be regarded as obstacles but as keys for very important locks. Users must be made aware of such concepts, not simply dropped into new environments. If the secretary had been given a clue, she never would have done it, but the only training this company ever gave her was how to use Word.

Moral: Preaching may be a pain, but it can sure stop a lot of FUBAR stupidity before it gets very far.

Curiosity killed the kilobyte These situations can vary, but have the common denominator of a user experimenting with something he knows is dangerous . . . and not watching what he's doing. P A Dunkin relates a situation that, surprisingly, I've encountered myself. (Dunkin declined his family's doughnut fortune in favour of becoming a sys admin for a software engineering firm.) After a recent virus outbreak, a curious engineer decided to crack open a sample of the virus to "see what made it tick".

But instead of doing this on a PC that wasn't connected to the LAN or even one using an operating system immune to the virus, he did neither and promptly reinfected the network. Dunkin's user had the good sense to come forward immediately -- the guy I had experience with didn't even realise what he'd done so we didn't detect the new infection until antivirus software caught it.

Solution: For me, it was multiple areas of virus detection, both server and client. Nowadays you can even get this at the infrastructure layer and I highly recommend it. Just because a virus is killed once doesn't mean it can't get resurrected.

Moral: Dunkin says his users learned from the experience -- the advantage of having geek users. For many of us, however, his subsequent strategy is applicable: "I maintain an open-door antivirus policy: No question about viruses is stupid, ever; and any time I have to send out a warning about an especially dangerous threat, I include an offer to help set up whatever measures are required, reminding them that it takes much less time to prevent an infection than to clean up after one."

Server abuse You can clean your server till it sparkles, but users can still find ways to abuse them -- especially on the storage front, as reader Yan Fortin relates. Fortin was having such a boring day, he was actually browsing his firewall logs simply for something to do (I hit Playboy.com in that situation, but to each his own).

Suddenly, he received a user call that network file access was being denied. Another call prompted him to put down his fascinating log reading and do a little investigating.

"Lo and behold, I had five e-mails warning me that the free space on the F: network share was getting dangerously low. Unfortunately for me, I had turned off the Windows Messenger Service on my workstation, so I couldn't receive any warning that way. Shame on me."

Indeed. Fortin searched the drive for every file bigger than 50MB and stumbled upon a marketing user who was copying approximately 30 150MB TIFF files from a DVD to the network. "I called her to inform her that I would delete all her [expletive deleted] files, and did so right after." Crisis over.

Solution: Fortin purchased additional hard disk space for the server right after this incident and also had a firm talk with the user about the relatively finite nature of server disk space.

Moral: Explaining things to inexperienced or even tech-phobic users may be a pain in the posterior, but it sure can save you time, trouble, and screaming managers in the long run.

Telecommuting terrors Always remember that even telecommuters eventually come to the office. One reader relates the experience of a remote user visiting the home office and immediately killing the entire network. A little laptop investigation showed that the user had decided to configure his laptop as a DHCP server for his home network, which "suddenly made his machine the default gateway for that segment".

Other examples include mums and dads who genially allow their kids to play high-end games on the corporate hardware, or (worse) to surf the Internet in all those dark and fringelike nooks that teenagers like to explore on the Web. While the adults are out having dinner, the kids are home infecting the workstation, which promptly begins to spew out viruses the next time dad either logs in or visits the office.

Solution: Perimeter defence. End-point security technologies such as Cisco's NAC or Microsoft's NAP are specifically designed to minimise this risk by scanning outside machines the moment they're connected to the network. Failure to meet with specific criteria, including everything from minimal patch levels to scheduled virus scans, means the PC is dumped into a quarantine area of the network where it can be scanned, updated, and fixed without risk of harm to other nodes.

Moral: Talk to your telecommuters. Fair use policies with a little bit of disciplining oomph behind them can go a long way toward having mum buy her precious offspring their own PC to infect rather than risking her job by letting them use hers.

Ultimate weirdness This one won our Deepest Chuckle Award. Dave Schultz related an incident in which he tagged a note to a network laser printer informing users that if print quality suffered enough to warrant a toner cartridge replacement, they should first "shake a few times to yield a few additional copies". Schultz was later berated because a user suffered a work-related back injury by reading the note, then picking up the entire HP LaserJet 4000 and trying to shake the printer back and forth.

Solution: Shoot the user, he's lame now, anyway.

Moral: Never let your blood pressure get too far into the dangerous numbers and keep a bottle of aspirin handy.

Top six steps toward disaster-recovery

Researching this article revealed to me how many variables folks tend to miss when running a network, as well as when planning to protect and recover that network.

I suppose some of the errors I encountered are more surprising to us consultant types because we live and breathe best practices. We live it, we breathe it we get to install and bill for it, and then we get to walk away and do it all someplace else. Day-to-day systems administrators live and breathe a just-get-it-done philosophy, and they can't walk away.

So in that spirit, I've condensed some of the disaster-recovery best practices into a top six list. Make sure you've got these six points covered, and you're much more likely to survive not only stupid human tricks but any kind of network disaster curveball Lady Fate may decide to pitch your way.

1. Test your backups.

This is first because it was by far the most popular entry. Someone installs a tape drive setup, installs the backup software, and schedules daily, weekly, monthly backups. Something happens a year later, and it turns out nothing's actually been running. Backups are boring, I know. Not to mention mind-crushingly tedious. But if you don't have them when you need them, you're done. So do a test backup and restore after installing a new backup system. Then -- and this is critical, not optional -- do a test restore every week. That's right: every week. Not the whole tape, just a specific subset of folders. Shouldn't take more than 15 minutes, and it can save your professional career in a crisis. Just do it.

2. Spend a little money on your backup software.

Don't just buy Bob's Basic Backup package because it's cheap or came with the tape drive. Spend some bucks here. Make sure the thing can support dynamic backups; also ensure that it can support individual folder and file restores. Take a step back and think about investing in a disk-heavy server to act as a disk-based backup between the tape drive and the network. Many of the better packages, including those from CA, IBM/Tivoli and Veritas, can manage this NAS-type device as well as the backup, which means not only safer data but much faster restore times. And the cost really isn't that huge.

3. Store a weekly copy off-site.

This was the next most popular entry even though it didn't get much play in the finished article. If you're worried about recovering data should the office building burn down, then keeping all the data in the office building isn't all that bright. Explain this to your tightfisted boss using small words. Get a safety deposit box or a secure business storage locker and bring your tapes there. One tape a week'll do you. Likely this is a quick 30 minutes out of your day door-to-door. Look at it like this: it's less desk work.

4. Block off access to servers.

If your business runs on its server applications, they shouldn't be accessible to just anyone -- including cleaning people. Put them in a room. Get ventilation. Think about things like sprinklers (bye-bye servers) versus Halon systems (servers live), UPS protection, a building-based power generator, and maybe even a Webcam-based monitoring system, such as the NetBotz system from APC.

Know that room is safe, and know what's going on inside it. Then add this new thing to the door called a lock. Make sure only you, the IT staff, and a responsible member or two on the executive team have the clearance to open this lock. If the cleaning people need to get in there, open the door for them and show them where they can plug in their gear.

5. Map out a plan for what happens if the office building burns down.

Worst-case scenario day. There are oodles of options in this department, so I'm not going to try and list them all here, but do decide if your business can shut down due to one of these occurrences or if it needs to recover somewhere else right away. And how quickly it needs to recover. Then figure out what it needs in order to recover. You should also make sure you can deliver all these requirements in time. Yes, this is a lot of work.

6. Write all of the above down and title it "Disaster Recovery Plan".

Put gold and red star stickers on the cover, then put your name, your IT staff's names, and some executive manager names on it, and make sure it's distributed to everyone who needs to see it. Then show it to everyone who doesn't need to see it. Make sure the section on what employees should do if the building blows up gets to the worker bees.

My snide tone and I are making all this sound obvious, but both the article and my wide IT travels have continually shown me throngs of people that keep putting these things off or simply ignoring them altogether. Yes, getting all this done is a month or more of real work. But having it in place when Godzilla steps on your server room: priceless.


[ Printer Friendly Version ]

[ Other stories about Tivoli, Laser, Veritas, VERITAS, APC, Microsoft, Gateway, HIS Limited, CompUSA, Dell, Messenger, Department of Defence, HP, ACT, Cisco, Boss, Linksys, IBM, NetBotz, Crimp ]