Differentiating between "Denormalization" and excuse making.
I've actually got a lot to discuss about database stuff but wanted to start out explaining a very basic concept. All me to disgress for a moment. I've worked at a few places where there were some Cobol/AS 400 developers that were quite full of themselves. They knew how to handle business with COBOL/AS400 and it went straight to their heads. Along came round wheels, indoor plumbing, relational databases and object oreinted programming and they dismissed it as a fad. Then as reality repeatedly *** slapped them, they just turned into eternal pity parties lamenting how all the good jobs are moving to India and how relational databases suck. And as is often the case with people that think Relational Databases suck, they were heavy on opinion but really light on reasons to back it up. There was a perfect correlation between the degree to which they thought RDBMS's sucked and their lack of knowledge about them. How you can know something is lame without knowing anything about it [with repsect to technology] is beyond me. Sure, I can know that drinking out of a Port-O-Potty sucks without doing it, but I at least have to know what's in a Port-O-Potty to understand why drinking from it would suck.
So their criticisms came down to two things for the most part. First they claimed that joins had too much overhead. Next they claimed that Normalization was unnecessary and that in many cases, it was impossible to do. Back when I had my Apple 2e, perhaps joins would be too expensive. But in and of itself, it's a lame argument. [I won't even touch the fact that many of their Killer AS400 apps used to take 20 minutes to run even the simplest of queries]. And they would build in redundancies everywhere. They must have all surfed the internet long and hard for "Lame excuses to use when you get busted creating a really crappy db structure" and somewhere stumbled across the word http://en.wikipedia.org/wiki/Denormalization because that was their excuse for everything. One manager who fancied himself an architect was addicted to redundancies. He actually lamented the fact that the tool we were using only allowed 250 columns and we had *many* tables that used every last one. When the chickens came home to roost he decided he needed a good excuse. The president of the company decided to circumvent him and went to each of the developers asking why we were having all the problems we were experiencing. And everyone that wasn't a COBOL programmer who refused to change said that same thing - redundancies. He was in a bind. Then he called a meeting, with the president, and had a bunch of printouts about Denormalization and how that's the approach he used.
So people that don't know what they are talking about often use Denormalization and redundancy synonomously. The metaphor that comes to mind is this... If you fill your bathtub up with hot water, then find it's too hot and proceed to turn on the cold water until it becomes tolerable, that's not the same as never turning on the hot water in the first place.
This silly excuse usually shows itself with 'Historical' data, they'll claim they built in the redundancies to handle historical data. If you had a customers table with a SalesPersonID which relates to a name, they'll point out that if you change the name of the sales person, you'll affect all the records. This issue can usually be worked out with effective dates but that's another story. So I'll concede, the historical data issue is a valid place to have redundancies. However, and this is a big howerver!!!!!, it's only valid if the historical data is not to be changed in the future. If it's data that you might have to change or update, then the redundancies are just that, redundancies. And you can still use historical data and use relational modelling. If you have the SalesPersonID and Name in the Historical table, and you have cause to update it because of business requirements, it's not 'denormalization' and it's a bad way to do things.
You may think this is so obvious that it goes without saying. However I was in a situation not too long ago where a database was created that had redundancies All over the place. If you needed to change a Salesperson's name (for instance, someone gets married), you had to go into at least 12 different tables to change it. And if you missed one record, it was enough to seriously screw up the logic reporting was built on. Another wonderful feature was computed fields stored as hard coded values. Every week a new series of transactions would be imported. A few processes were run and the aggregate total of sales, fees etc were stored in multiple tables. Then the details that were used to create them were stored in another series of tables. If you changed data in one place and didn't change it everywhere else, reporting would be totally hosed. To say it was a nightmare to maintain would be putting it mildly. Yet the person that built this system claimed that it was done intentionally as a means to double check the values. So let me ask you this. If you have a table with CustomerID, Sales, Week, and you run a process to stick the aggregate of sales in one table, how is that any different than just using SUM? Rather, how can you claim it's 'better' or intentional? People like this often rationalize it with "Well if everyone did their job and just made sure they updated all the tables, there wouldn't be a problem.' True enough. But people make mistakes. If you can't force all the other changes to occur each time source data changes through triggers or integrity constraints, you can't guarantee changes will be made. Plain and simple.
When you see a nightmare database or program, it's often attributed to the ignorance of the person that created it. But pure ignorance will typically only produce bad stuff, not truly terrible stuff. To be truly terrible, it takes someone who's convinced their way is best, while being totally ignorant of the consequences. You can only make something so bad accidentally. No, the really horrendous things out there are always the result of deliberate action by someone or some group of people. And this begs the bigger question - why, in 2006, with all the books out there, internet sites etc, so people still fight normalization when using Relational Database Systems? And why or why do people that have no clue what they are doing and little to no desire to learn get to design critical systems?
I'm betting this is more common than it may seem. Anyone else still run into stuff like this?