Exploring Fourth Normal Form

(source: physicallyeducated.org)

(source: physicallyeducated.org)

This is just a very quick note for myself, to help me wrap my head around fourth normal form (4NF). It is really quite simple, and to be clear and concise about 4NF I think you need to keep it simple.

 
Consider the following data set, which records a tennis player, their opponents and the tournament that they met this opponent in:

Match(Player, Opponent, Tournament)

 
 
 
 
This data set is not in 4NF, because there is a multiplicative effect between players, opponents and tournaments. Consider for example, that Djokovic played Nadal in the Australian Open and then played Nadal again in the French Open and Wimbledon, we would get the following table:

Player Opponent Tournament
Djokovic Nadal Australian Open
Djokovic Nadal French Open
Djokovic Nadal Wimbledon
Djokovic Tsonga Australian Open
Djokovic Murray Australian Open

Of course, it is likely that Djokovic played more than one match, and therefore we also have entries in the table for his Australian Open matches against Tsonga and Murray as examples.

Notice how there is redundancy in this table. There is redundancy in the combination of {Player, Tournament} with 3 entries for {Djokovic, Australian Open}, as well as {Player, Opponent} with 3 entries for {Djokovic, Nadal}. This will potentially lead to update / delete anomalies, but more importantly (for our discussion on 4NF), this leads to the multiplicative effect where the number of rows necessary to describe Djokovic’s matches are a multiple of the number of tournaments and opponents that he played. Formally, we would say that there is a multivalued dependency of Tournament on Player and also of Opponent on Player:

Player ->> Tournament

Player ->> Opponent

Before we resolve this, let’s satisfy ourselves that this relation is in fact in Boyce-Codd Normal Form. Boyce-Codd Normal Form (BCNF) states that for all functional dependencies, A -> B, A is necessarily a key (or superkey) of that relation. Without getting too wrapped up in that definition, hopefully it is sufficient to say that because there are no functional dependencies in our Match relation, it is therefore in BCNF.

Similarly, our Match relation is also in 2NF (which is the one I always struggle with when I think about examples for 4NF). Why is this? Well simply, 2NF is satisfied when all non-key attributes are dependent on the key attributes. Because there are no functional dependencies which might suggest any subset of Match as a natural key, it is necessary that the natural key of our Match relation is all of the attributes (Player, Opponent, Tournament).

Back to 4NF… Resolving multivalued dependencies is quite straight forward with the decomposition into separate relations:

Match(Player, Opponent)

Tournament(Player, Tournament)

Of course there is a caveat to this approach: the above decomposition has removed the relationship between a Match and a Tournament. So while we might be able to say that Djokovic has played Nadal 3 times, we couldn’t then say which Tournaments these matches were in. Of course, this is simple to fix with a foreign key relationship:

Match(Player, Opponent, TournamentID)

Tournament(TournamentID, Player, Tournament)

And there we have it. Maybe not the best example :), but it is simple perhaps to think of an example where we do not need to introduce a foreign key to preserve information (e.g. {Player, Opponent, Physio}, where for any given tournament Player X will have many opponents and may have more than one physio, but there is no reason to record this information together either in a single relation or sub-relations with foreign keys).

Mostly, this post was about explaining why 4NF satisfies BCNF and 2NF so that I don’t get confused in the future. Hopefully I have managed that!

 

Advertisements

2 comments

  1. My rule of thumb for 4NF and 5NF is taken straight from Graham Simsion’s Data Modelling Essentials – it only applies to key-only entities with 3 or more columns and the rule is that you can’t resolve 2 (or more) m-m relationships with one table – you have to resolve each m-m separately. “In simple terms..the definition of 4NF effectively says that two many-to-many relationships cannot be resolved with one table. Satisfying 5NF requires that two or more many-to-many relationships are not resolved by a single table.” p404 3rd Edition. He does go on to say that Chris Date has pointed out that in some situations it is not only key-only tables that can violate 4NF and 5NF but he doesn’t discuss them further.

    Like

    • Golden, thank you for this Clare. If I am reading this correctly, then the critical phrase here is: “key-only entities”. This was indeed the missing link that had me going around in circles and led to this blog post to try to clear things up. Funny how just one or two words can totally shape a discussion. Thank you!

      Like

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: