Normal Form in Databases?

596 views

I’m trying to piece together from my notes, and I’ve been looking at it too long too make sense at this point. What is the difference is 1, 2, 3. 4 normal form?

In:

Anonymous 0 Comments

So: Let’s get the fundamental idea out of the way. In a database, you’d like to store data in one place, and one place only. You don’t want to have it so that a single piece of information appears in multiple places in the database, nor do you want your database to be unable to process requests that “don’t fit” in to the database: like being unable to store the information about four children, because you assumed that each parent only has three at most. The aim of the game is that each data appears once, and that the data you’re storing is connected in a somewhat logical fashion.

So, What are the normal forms?

***1NF***

This is essentially just a sanity check: does your database make any sort of sense fundamentally?

A database in 1NF will follow the following 4 rules:

1. Each column only stores 1 value. You f.i won’t have a column storing a phone number *and* address: you’d have two columns – one storing phone numbers, and one storing addresses.

2. Each column only deals with a single domain of data. If your column is named “Phone numbers” you would expect every single row in that column to be a phone number. If your “phone numbers” column has a row that has “123, Street Avenue Rd” then you have a problem. This is a pretty common sense rule: Columns store what they say they are storing.

3. Columns in a table have unique names. This is also pretty common sense, if two columns in a table are named “phone numbers” how will the database system know which one you’re asking for?

4. The order of the data entered does not matter. It doesn’t matter if entry “12” appears before entry “9”, or if “John Smith” is before or after “Samantha White”. Your database either does not care, or has a column specifically specifying what the order should be (like Date of birth or Alphabetical). You can sort afterwards as you need the data.

So, that’s all fairly understandable.

***2NF***

The second normal form essentially makes sure that data in a table is identified with the *entire* primary key.

A database in 2NF if:

1. it is in 1NF

2. It does not contain any partial dependencies.

Partial dependencies are when some columns are identified by *part* of a composite primary key, but not *all* of it.

Let’s for instance take a simple example I nicked from the internet.

You’re storing some information about student grades.

|Student Id|Subject ID|Student Name|Grade|
:–|:–|:–|:–|
|1|40|John|70|
|1|45|John|80|
|2|40|AAron|60|
|3|29|Lisa|80|

You have a composite primary key from (student id, subject id). In order to uniquely identify a grade you must specify a student, and a subject – since students can be in many subjects and subjects can be attended by many students.

However, notice “name” here, which identifies the name of the student. That doesn’t depend on the subject, only the student ID. This is partial dependency: a column (student name) is uniquely identified by part of the primary key (The student ID). This column will be the same no matter what subject this student is learning.

The solution here is simply dropping the “name” column and adding it to the “Students” table, where it presumably only depends on the student ID: as so.

Table Grades

|Student Id|Subject ID|Grade|
:–|:–|:–|:–|
|1|40|70|
|1|45|80|
|2|40|60|
|3|29|80|

Table Students

|Student Id|Name|DOB|
:–|:–|:–|:–|
|1|John|12-5-1997|
|2|AAron|30-1-1998|
|3|Lisa|20-10-1997|

***3NF***

A table is in 3NF if

1. It is in 2NF

2. It doesn’t have transitive dependency.

Transitive dependency is when a column is dependent on a different column that *isn’t* a primary key. If a column is dependent on some information in the database, it *must* be dependent on a primary key.

***BCNF***

This is a bit tricky, but is essentially a stronger version of 3.

a BCNF database follows:

1. It is in 3NF

2. for any dependency A → B, A should be a super key.

What it essentially says is that if B depends on A, then A is a super-key. A super-key is any value or set of values that can be used to uniquely identify a row. It’s fairly rare that a 3NF table is not of BCNF, but I suggest you try to look further in to it on your own.

***4NF***

A database is in 4NF if

* it is in BCNF

* There are no multi value dependencies.

A multi value dependency is essentially where two independent columns both depend on the same primary key.

Imagine a table that includes kids and pets of someone.

|Id|Name|Kid|Pet|
:–|:–|:–|:–|
|1|John|Jennifer|Sparkles|
|1|John|Billy|Sparkles|
|2|AAron|null|Minny|
|2|AAron|null|Blubbers|
|2|AAron|null|Doggo|
|3|Lisa|Sarah|Cuddles|
|3|Lisa|Sarah|Junebug|
|3|Lisa|Joey|Cuddles|
|3|Lisa|Joey|Junebug|

Notice how we’re just splurging unwanted rows, because we are trying to include each kid/pet combination for each employee. Lisa has two kids and two pets, but gets four rows because we must list each kid and each pet.

A better solution would be having a “kids” table and “Pets” table, so that kids and pets can independently depend on the person in question instead of having to try and share a single dependency.