Data Normalization, Denormalization, and the Forces of Darkness



Battling the minions of evil is not an appropriate time for cowboy database design. Your users' lives, and perhaps those of the entire world, depend on accurate and up-to-date data. You can't take a chance on duplicated data becoming inconsistent. You need a solid data model with little or no maintenance.

At the same time, a zombie apocalypse is hardly the right situation to prioritize the purity of the data model over usability. Your users need answers fast, and their brains may already be appetizers by the time a dozen joins complete. How do we prioritize both maintainability and performance?

A good DB admin knows whether normalization is the right approach for a particular data set, how far to normalize, and when and how to denormalize to improve performance. Let's hope the warriors of the forces of light have a good DB admin. If they call on you, are _you_ up to the challenge?


“Normalization” just means making something more normal, which usually means bringing it closer to conformity with a given standard.

Database normalization doesn’t mean that you have have weird data, although you might. It’s the name for an approach for reducing redundant data in databases. If the same data is stored in more than one place, keeping it synchronized is a pain. Furthermore, if you can’t trust your synchronization process absolutely, you can’t trust the data you retrieve.

You may have heard that normalization is the enemy of search efficiency. This is kind of truish, but not really true. At the lower levels, imposing a bit of order on your chaotic heap of data will probably improve efficiency. At the higher levels you may see a performance slowdown, but it probably won’t be as bad as you fear and there are ways to address any problems you do find.

Speaking experience