Differentiating between "Denormalization" and excuse making.

Published 27 September 6 6:47 AM | William

I've actually got a lot to discuss about database stuff but wanted to start out explaining a very basic concept.   All me to disgress for a moment.  I've worked at a few places where there were some Cobol/AS 400 developers that were quite full of themselves. They knew how to handle business with COBOL/AS400 and it went straight to their heads.  Along came round wheels, indoor plumbing, relational databases and object oreinted programming and they dismissed it as a fad.  Then as reality repeatedly *** slapped them, they just turned into eternal pity parties lamenting how all the good jobs are moving to India and how relational databases suck.  And as is often the case with people that think Relational Databases suck, they were heavy on opinion but really light on reasons to back it up.  There was a perfect correlation between the degree to which they thought RDBMS's sucked and their lack of knowledge about them. How you can know something is lame without knowing anything about it [with repsect to technology] is beyond me.  Sure, I can know that drinking out of a Port-O-Potty sucks without doing it, but I at least have to know what's in a Port-O-Potty to understand why drinking from it would suck.

So their criticisms came down to two things for the most part.  First they claimed that joins had too much overhead.  Next they claimed that Normalization was unnecessary and that in many cases, it was impossible to do.  Back when I had my Apple 2e, perhaps joins would be too expensive.  But in and of itself, it's a lame argument. [I won't even touch the fact that many of their Killer AS400 apps used to take 20 minutes to run even the simplest of queries].  And they would build in redundancies everywhere.  They must have all surfed the internet long and hard for "Lame excuses to use when you get busted creating a really crappy db structure" and somewhere stumbled across the word http://en.wikipedia.org/wiki/Denormalization because that was their excuse for everything.  One manager who fancied himself an architect was addicted to redundancies.  He actually lamented the fact that the tool we were using only allowed 250 columns and we had *many* tables that used every last one.  When the chickens came home to roost he decided he needed a good excuse. The president of the company decided to circumvent him and went to each of the developers asking why we were having all the problems we were experiencing.  And everyone that wasn't a COBOL programmer who refused to change said that same thing - redundancies.  He was in a bind.  Then he called a meeting, with the president, and had a bunch of printouts about Denormalization and how that's the approach he used. 

So people that don't know what they are talking  about often use Denormalization and redundancy synonomously.  The metaphor that comes to mind is this... If you fill your bathtub up with hot water, then find it's too hot and proceed to turn on the cold water until it becomes tolerable, that's not the same as never turning on the hot water in the first place.

This silly excuse usually shows itself with 'Historical' data, they'll claim they built in the redundancies to handle historical data.  If you had a customers table with a SalesPersonID which relates to a name, they'll point out that if you change the name of the sales person, you'll affect all the records.  This issue can usually be worked out with effective dates but that's another story.  So I'll concede, the historical data issue is a valid place to have redundancies.  However, and this is a big howerver!!!!!, it's only valid if the historical data is not to be changed in the future.  If it's data that you might have to change or update, then the redundancies are just that, redundancies.  And you can still use historical data and use relational modelling. If you have the SalesPersonID and Name in the Historical table, and you have cause to update it because of business requirements, it's not 'denormalization' and it's a bad way to do things. 

You may think this is so obvious that it goes without saying. However I was in a situation not too long ago where a database was created that had redundancies All over the place.  If you needed to change a Salesperson's name (for instance, someone gets married), you had to go into at least 12 different tables to change it. And if you missed one record, it was enough to seriously screw up the logic reporting was built on.  Another wonderful feature was computed fields stored as hard coded values.  Every week a new series of transactions would be imported.  A few processes were run and the aggregate total of sales, fees etc were stored in multiple tables.  Then the details that were used to create them were stored in another series of tables.  If you changed data in one place and didn't change it everywhere else, reporting would be totally hosed.  To say it was a nightmare to maintain would be putting it mildly. Yet the person that built this system claimed that it was done intentionally as a means to double check the values.  So let me ask you this.  If you have a  table with CustomerID, Sales, Week, and you run a process to stick the aggregate of sales in one table, how is that any different than just using SUM?  Rather, how can you claim it's 'better' or intentional?  People like this often rationalize it with "Well if everyone did their job and just made sure they updated all the tables, there wouldn't be a problem.'  True enough.  But people make mistakes.  If you can't force all the other changes to occur each time source data changes through triggers or integrity constraints, you can't guarantee changes will be made. Plain and simple.

When you see a nightmare database or program, it's often attributed to the ignorance of the person that created it.  But pure ignorance will typically only produce bad stuff, not truly terrible stuff. To be truly terrible, it takes someone who's convinced their way is best, while being totally ignorant of the consequences.  You can only make something so bad accidentally.  No, the really horrendous things out there are always the result of deliberate action by someone or some group of people. And this begs the bigger question - why, in 2006, with all the books out there, internet sites etc, so people still fight normalization when using Relational Database Systems?  And why or why do people that have no clue what they are doing and little to no desire to learn get to design critical systems?

I'm betting this is more common than it may seem.  Anyone else still run into stuff like this?

Comments

# Brian Madsen said on September 27, 2006 11:34 PM:

Hey Bill - good to see you're settling in and back at blogging again..thought for a second that you'd gone AWOL on us and we'd hear about it on CNN later this week.

Anyhow...i feel your pain, i seriously do..i have three REALLY badly designed applications here (legacy systems) that's causing us no end of trouble.

yesterday i had a word with my department about the biggest issues (such as using foreign keys but not actually setting up a relationship, not validating data from non-trusted sources, correct DB design etc) - now the senior guys on my staff are tight..i mean real tight and they know what they're doing.

but unfortunately time constaints and legacy systems means we can't police what our DB guys do..hell i've tried to get them to listen to me for the last 6 months and i'm getting to the point of either 1) sacking them or 2) leaving myself since it's a constant battle.

have mentioned it to senior management but there's no clue there...no light..only cobwebs since they're not thinking straight and it appears that option 1) isn't really a viable option (eg. sacking them can't be done...somebody's protecting them, high up in the food-chain).

# Jerry said on September 28, 2006 4:12 AM:

Oh Boy, you must have met some very poor minded AS/400 programmers - I have never met such folks, that maybe they never went to Common or the IBM ITSO Conferences. AS/400 DB is a RDBMS System - in the first Versions they just not told it. AS/400 (i5 or iSeries) still rocks!

# Andrei Ignat said on October 2, 2006 10:42 AM:

And I think that , maybe , it is also a "assurance" for the database maker... he and only he knows where to put any modifs... and any other person only make educated guesses.

Search

This Blog

Tags

Community

Archives

News

  • William G Ryan William Ryan Bill Ryan W.G. Ryan Charles Mark Carroll Charles M Carroll
    My Blog Juice Microsoft MVP
    Bill Ryan W.G. Ryan William Ryan
    Cuckooz' MySpace Page View Bill Ryan's profile on LinkedIn
    My Profile on Twitter
    Please note that this is my personal blog and the opinions expressed are my own. Also, comment moderation is about one of the least important things in my life so please keep that in mind. I can't vouch for the authenticity of any of the posters so please don't hold me accountable. And whatever you do, don't pretend to be Noted Option Strict Off expert and AspFriend Charles Mark Carroll when you post. Doing so will lead him to become apoplectic and write absurd accusatory posts about me that are as coherent and thought out as they are factually correct. He does a stellar job proving his reputation is well deserved and he doesn't need any help from you making himself look foolish. If I have to listen to him banging his spoon off of his high chair one more time, I'm going to burst into flames so please don't make that happen!

    My other sites

    Cool Stuff

    Book Stuff

    Security

    ORM

    Data Access

    Funny Stuff

    Compact Framework Stuff

    Web Casts

    My KnowledgeBase Articles

    My MVP Profile

    Design Patterns

    Performance

    Debugging

    Remoting

    My Fellow Authors

    My Books

    LINQ

    Misc

    Speech

    Syndication

    Email Notifications