Normalization

Normalization is the process of efficiently organizing data in a database. There are two goals of the normalization process: eliminating redundant data (for example, storing the same data in more than one table) and ensuring data dependencies make sense (only storing related data in a table). Both of these are worthy goals as they reduce the amount of space a database consumes and ensure that data is logically stored.

When a table is normalized, the non-key columns depend on the key used. From a relational model point of view, it is standard to have tables that are in Third Normal Form. Normalized physical design provides the greatest ease of maintenance, and databases in this form are clearly understood by developers.

Fist Normal Form(1NF):

The rules for First Normal Form are:

  • Every column must be atomic. It cannot be decomposed into two or more sub columns.
  • You cannot have multivalued columns or repeating groups.
  • Each row and column position can have only one value.

Table in below figure violates First Normal Form, since the dept_no column contains a repeating group:

Above table violates first normal form

Normalization creates two tables and moves dept_no to the second table. Correcting First Normal Form violations by creating two tables

First Normal Form (1NF)

Second Normal Form:

  • Every non-key field must depend on the entire primary key, not on part of a composite primary key.
  • If a database has only single-field primary keys, it is automatically in Second Normal Form.
  • Remove subsets of data that apply to multiple rows of a table and place them in separate tables.

In the below image, the primary key is a composite key on emp_num and dept_no. But the value of dept_name depends only on dept_no, notĀ  on the entire primary key.

Violates 2nd Normal Form

To normalize this table, move dept_name to a second table, as shown in below image.

Second Normal Form

Third Normal Form:

  • For a table to be in Third Normal Form, a non-key field cannot depend on another non-key field or in other words, remove columns that are not dependent upon the primary key.

The table in below image violates Third Normal Form because the mgr_lname field depends on the mgr_emp_num field, which is not a key field.

A table that violates Third Normal Form

The solution is to split the Dept table into two tables, as shown in the below image. In this case, the Employees table, already stores thisĀ  information, so removing the mgr_lname field from Dept brings the table into Third Normal Form.

Correcting Third Normal Form violations by creating two tables

Advantages of Normalization:

Normalization produces smaller tables with smaller rows:

  • More rows per page (less logical I/O)
  • More rows per I/O (more efficient)
  • More rows fit in cache (less physical I/O)
  • Searching, sorting, and creating indexes is faster, since tables are narrower, and more rows fit on a data page.
  • You usually have more tables. You can have more clustered indexes (one per table), so you get more flexibility in tuning queries.
  • Index searching is often faster, since indexes tend to be narrower and shorter.
  • More tables allow better use of segments to control physical placement of data.
  • You usually have fewer indexes per table, so data modification commands are faster.
  • Fewer null values and less redundant data, making your database more compact.
  • Triggers execute more quickly if you are not maintaining redundant data.
  • Data modification anomalies are reduced.
  • Normalization is conceptually cleaner and easier to maintain and change as your needs change.
  • While fully normalized databases require more joins, joins are generally very fast if indexes are available on the join columns.

Adaptive Server is optimized to keep higher levels of the index in cache, so each join performs only one or two physical I/Os for each matching row. The cost of finding rows already in the data cache is extremely low.

This entry was posted in RDBMS concepts. Bookmark the permalink.

Leave a Reply