Normalization really does matter
It seems like years ago, but I remember one of my professors giving us an assignment. It involved taking a database chock full of redundancies and writing the SQL Statements to make a few small updates. We thought it was an assignment to test our SQL Skills. However we were graded on how close to 'right' we got all of the changes. It wasn't about SQL at all, he could care less about SQL at this point. It was about illustrating, with much painful monotony, why "Normalization matters."
After we turned the assignment in, he asked everyone how long it took. Suffice to say it took everyone a long time. But the interesting thing was that not one single person got all of the changes correct. Many got close, but no one got all of them.
Then he explained some more why normalization matters. You see, not only did working with non-normalized data (as opposed to denormalized) take forever to do, it was very error prone. No matter how careful you are, if you do the same thing over and over, you're likely to miss something.
Afterward, he proceeded to give us a story from the 'real world'. He told us how some things in CS have great theoretical value but not much practical value. But he said that in general, good theory makes for good practice and no where is this more true than in DB Design. We were told that when we get out into the real world, we'll hear all sorts of 'Senior' folks trashing normalization as ivory tower stuff that works great in college but doesn't have value in the real world. He said that *when* we hear this, to immediately ask the person "At which level doesn't normalization make sense in the 'real world'? All of them or just certain forms" This was because, in his opinion, the only people that will trash normalization are ones who don't understand it. Granted, there were many caveats in the lectures. Sometimes it's best to denormalize (which, is something you do AFTER you've normalized - for the uninitiated). Many will design garbage and when the redundancies are pointed out will say "I totally do believe in normalization but I denormalized the design". If it wasn't normalized in the first place it couldn't have been denormalized, such statements are mere excuse making. He also pointed out that sometimes when time is of the essense and maintainability isn't a real issue, then normalization may not justify the costs. There were definitely exceptions to his rule, but one thing that always stuck with me was that folks that trash normalization almost always have never studied it very far.
So why am I babbling about this? It's simple. Some things can be totally screwed up and then fixed quickly/easily/inexpensively. I could get the color scheme of a CSS layout totally upside down and fix it pretty quickly in many cases. Other things are a little more difficult but still doable. Let's say I didn't use inheritance and coded some total redundant classes. Making the code modification won't be trivial but if my code is modular, it should be fairly straightforward considering the development tools in place these days. Other things though are deal breakers. Absolute and utter deal breakers. Database design is the epitome of a deal breaker. To those new to development, you may be asking why DB Design is so vastly different from other design issues.
Just imagine the following scenario. You have a Dog and a Cat class and they are in a running production app that's working great. You decide that you should have had a Mammal base class and decided to implement it. You guess it'll take you about 16 hours to find every instance and make the appropriate changes. So the app runs along, users are happy, you take your 16 hours, make your changes , test them (or test them, make them for many) and redeploy. No problems. Users are none the wiser and everyone's happy.
Now what if this was a production database? Well, if you have downtime on the db then you can create a replica of the data, make your changes, verify they worked correctly and then implement them at some time when no one is using the system. I've only worked in two places where such was the case. If the design flaws go out the door on a live system, you're stuck.
In the Dog Cat example, if you totally screwed something up, you can redeploy the previous version and in many cases it won't be too big of a deal (in some it obviously will but that's related to installation more than anything else). But what if you damaged a lot of data, the impact very well may (and probably will) be immediate. Moreoever, you are probably dealing with a moving target b/c the data is constantly changing. Even if you think you coded everythign correctly, new data may be introduced that you weren't counting on. And even then, even assuming you got everything right, many changes of this sort are going to cause service interruptions. Depending on what the db is used for, this could be a huge deal.
We all know that the further downstream it is that a bug is caught the more it costs to correct. We've all seen the upward trending curve. The thing is that in most case, with DB changes in a production system, that curve is very short and very steep. And if you do any of the following, you should be flogged:
- Use Reserved words as object names, column names etc (they are called Reserved words for a reason)
- Use inconsistent naming conventions
- Use unintelligible naming conventions
- Ignore data types. NVARCHAR isn't the solution to everything. Yes, I know you can cast dates held in VARCHAR fields to DATETIME Values but that's irrelevant
- Fail to use keys. At a minimum you're going to have a lot of trouble getting tools to update data but that's the least of your worries
- Use multiple fields to represent similar data and then using inconsistent rules for maintaining the data that they are populated with
- Fail to enable integrity constraints. I once heard a developer, with a straight face, say he didn't believe in integrity contsraints b/c he was a 'real developer' and preferred to write his own. He contended that he wanted to make sure things were done right as opposed to being lazy and leaving it to Oracle to handle. You have to have the initials CC or have forgotten to take your lithium if you think you're going to do a better job writing your own integrity constraints than Oracle/Microsoft etc are. And for those of you that are going to get all pedantic and tell me of some obscure bug or situation where MS's or Oracle's constraints fail, I'd counter with this. Use their AND use your own. You can't guarantee that they only way the data will be accessed is through your killer app and it's uber integrity algorithms. So why leave it bare? If you're constraints are really that good, then it shouldn't ever be an issue anyway right?
- Freestyling! Freestyling is cool as long as it's done in a Rap/BMX/Skateboard etc context. It has no place in the development world other than in R & D scenarios. RDBMS systems are well tested and rooted in very strong theoretical foundations. They have stood the test of time and they work. They aren't perfect but they are damn good and they work well when used correctly. Unless you really are a super genius, chances are that you're not going to have some superior way of doing things that the rest of the world has missed. And even if you did find one, the chances of you finding another one are slim to none. And why do these folks never seem to freestyle with other things like addition, subtraction, multiplication, division, gravity, intertia etc? What do you think would happen if you owed the IRS 10k and used your own math to compute that they actually owed you 10k? Why would it be any different with anything else? So if you're some Maverick has a ton of 'unique' ways of building your dbs - all you're doing is shoving the burden of your sloppiness onto someone else.
- Failing to at a minimum follow the first two normal forms.
It's a matter of "you can get with the program or you can pay a ton of money to have someone else fix it, or spend a ton of time down the road fixing it yourself." And if this happens, you can't cry foul - you're the one that decided to break the rules.