Learn about Database Normalization

JAVIER ALEJANDRO DIAZ PORTILLO
Nov 23, 2025
6 min read

Updated: Dec 4, 2025

What is Database Normalization?

Database Normalization is the process of structuring a relational database to improve its efficiency, consistency, and accuracy. It makes it easier to manage, maintain, and avoid any data-integrity failures.

Database error highlighted: "No Primary Key was defined" in red. Table shows duplicate customer IDs with different birthdates. — Basic example of Data-Integrity Failure

Data-integrity Failure occurs when the data cannot be trusted in the table due to a bad Database design, consisting of many unnormalized tables. With normalization, the process of organizing attributes is done in such a way that it reduces data redundancy, which considerably increases the size of the database, as the same data is repeated in many places.

Normalization in a relational model consists of standard methods, called Normal Forms, which split one complex and unnormalized table into multiple, much simpler tables, linked between each other to provide the information required from a query, using JOIN and conditional statements to tailor the view into one new table with consistent data.

What are Normalized Tables?

Normalized tables protect the database from contradicting information, making it easier to understand, enhance, and extend properties. This also protects against insertion, update, and deletion anomalies that may present themselves with many transactions occurring in a large database.

We can determine if redundant information can be presented with sets of criteria such as 1st Normal Form, 2nd Normal Form, 3rd, 4th, and so on. This can be presented as a safety assessment, in which the 1st Normal form would be the most basic level, and the 4th Normal form would be more advanced.

Insertion, Update, and Deletion Anomalies

Insertion Anomaly: When it is not possible to insert data into a database because the required fields are missing or because the data is incomplete.
Update Anomaly: When modifying data in a database can result in inconsistencies or errors.
Deletion Anomaly: When deleting a record from a database can result in the unintentional loss of data.

Unormalized Form (UNF)

The Unormalized Form is a table that will not have the 1st, 2nd, and 3rd levels of normalization. The table will not respect the rules considered on the 1st Normal form, which makes the data inconsistent, more failures and complex to manage at a higher scale of transactions.

Table listing car brands Ford, Honda, Chevrolet with models, countries, continents. Highlighted columns show engine types and markets. — Example of an Unormalized Form Table

1st Normal Form (1NF)

A relation is considered 1st Normal Form if every attribute in that relation is single-valued. It is a very essential step in order to reduce redundancy and anomalies when modifying the database. A relation table is considered 1st Normal Form when:

All the attributes contain independent values

A table cannot have columns referring to the same attribute; it should be shown as a record instead. Columns like [Car_Model_1], [Car_Model_2], [Car_Model_3] to refer to the same brand can be instead transformed into one column [Car_Model], then specify the Model along with the Car Brand, which makes each record value unique.

Tables show car brands and models in UNF and 1NF forms. UNF has multiple columns; 1NF has single records per row. Red, yellow, green boxes. — Comparison UNF Table vs 1NF Table

Each record is unique, as shown in the picture, so when a query is made, we can easily find and distinguish the information that we are looking for, instead of having duplicate and inaccurate information.

Each Column must hold values of the same data type

Each column must store the same type of data. You cannot mix different types of information in the same column.

Two tables compare data structuring: one with a combined year/body type column, another split. Red text notes data inconsistencies. — First NF Violation Example for inconsistent data

Each record is unique, meaning it can be identified using a primary key

Each column must store the same type of data. You cannot mix different types of information in the same column.

Top table shows duplicate car data highlighted in red; bottom shows corrected table with ID and no duplicates. Text explains changes. — First NF Violation Example for duplicate roles and lack of primary key

There are no repeating groups or arrays in any row

First NF Violation Example of Columns with arrays

1st Normal Forms Violations

In summary, the following violations are checked for this level:

When we use row order to convey information
Mixing Data types within the same column (This is at the design level, a DB itself won't allow you to do this)
A table that does not have a primary key, as there needs to be a way to differentiate the unique values presented in a Database
Storing a repeating group of data items on a single row.

2nd Normal Form (2NF)

For a table to be 2NF, it must satisfy the First Normal Form, and then each non-key attribute must depend on the entire primary key. This is done by eliminating a partial dependency, which normally occurs when a non-prime attribute depends on part of a given candidate key.

Second NF Example, splitting tables based on partial dependency

For this example, we can see the country depends on the brand column, but not on model, so by separating the tables, we can remove this partial dependency, given that for the table to be unique, we would need to consider Brand and Models to be the candidate key and country as a non-prime attribute.

The purpose of the normalization to 2NF is to make the database’s structure generally clearer and flexible by organizing it with functional dependencies.

3rd Normal Form (3NF)

Every non-key attribute must depend only on a key attribute, not a non-key attribute on another non-key attribute. For a table to be considered 3NF, it should already comply with 1NF and 2NF. This ensures a reduction of the data anomaly mentioned, given that tables could encounter issues, mostly update anomalies.

After a 3rd Normal form, usually the tables can be considered as normalized. However, there are more levels to be considered for specific situations.

Third NF Example, Non-Key Attributes being associated with Key Attributes

4th Normal Form (4NF)

The fourth Normal Form is a level where there are no non-trivial multivalued dependencies other than a candidate key. Multivalued dependencies in a table must be multivalued dependencies on the key. It builds on the first three normal forms and the BCNF. In addition, it must not contain more than one multivalued dependency, ensuring that a relation does not contain multiple independent one-to-many relationships within a single table.

A multivalued dependency occurs in a relation when one attribute determines multiple independent values of another attribute, independent of other attributes. A multivalve dependency always requires at least three attributes because it consists of at least two attributes that depend on a third.

Comparison of UNF and 4NF database tables showing brands, models, and market dependencies. Text explains table separation. Red highlights. — Fourth NF Example, Multivalued dependencies should be in seperate tables

5th Normal Form (5NF)

Fifth Normal Form is one of the highest levels of normalization, also known as the Project-Join Normal Form. The table (which must be in 4NF) cannot create smaller tables without losing data. The whole point of breaking down into tables is to avoid redundancy; however, once we need to reconstruct the original data, there should not be any repetition of data or loss of data from the original table.

Table showing Corolla sales data: USA hybrid (5000), USA gasoline (7000), Japan hybrid (4000). Black background, white text. — Example of Fourth NF Table

Imagine we have this table, it seems everything is okay, but here are some issues: Model, Market, and Engine are independent values. If we add a new Market or the model has a new engine, then we will have to add multiple rows that contain similar information.

Two tables on a black background show car model data. "Models_engines" lists Corolla with Hybrid and Gasoline. "Models_markets" lists USA and Japan. — Fifth NF Tables based on previous table

This is as small as a Database Table can get. We can add some more columns to the tables if needed, but take into consideration if they still comply with the 5NF rule. Avoid adding columns that are already mentioned in the other tables that tie directly with these tables. Once you query the database, you can use JOINs and filter by columns to make one big table with the columns you want.

Final Results

Here is a diagram for the database I have created. With this, we can say that there will be less redundant information, fewer errors, better maintenance, and overall, we can add or create more tables and associate the new tables with the current ones.

Database schema diagram showing tables for countries, brands, models, engines, markets, model colors, features, and sales with relational lines. — ER Diagram of final result

If I want to query all these tables into one single table, by using JOIN and specifying the columns we want to retrieve, we can have this table.

Table displaying car brands, models, countries, continents, engine types, and markets. Red text notes data precision post-JOIN statements. — Unified Table of all tables in one query

Advantages and Disadvantages

Advantages

Eliminates data redundancy and ensures that each piece of data is stored in only one place
Helps ensure each table stores only relevant data, improving the data integrity of the database
Simplifies the process of updating data, as it only needs to be changed in one place rather than in multiple places throughout the database
Enables users to query the database using a variety of different criteria, as the data is organized into smaller, more specific tables that can be joined together as needed
Ensures that data is consistent across different applications that use the same database, making it easier to integrate different applications and ensuring all users have access to accurate and consistent data

Disadvantages

Normalization can result in increased performance overhead due to the use of additional JOIN operations, and could have as result in slower query execution times.
The loss of data context, as data may be split across multiple tables, requires additional joins to retrieve
Proper implementation of normalization requires expert knowledge of database design and the normalization process.
The complexity of the database design can increase if the data model is not well understood or if the normalization process is not carried out correctly

Javier Diaz