Table of Contents
What is Database Normalization?
Database normalization is the process of organizing data in a database to reduce redundancy and improve data integrity. It involves dividing large tables into smaller, related tables and defining relationships between them.
Why Normalize Databases?
- Eliminate Data Redundancy: Avoid storing the same data in multiple places
- Improve Data Integrity: Reduce the risk of inconsistent data
- Save Storage Space: Reduce database size by eliminating duplicate data
- Easier Maintenance: Updates only need to be made in one place
- Prevent Anomalies: Avoid insertion, update, and deletion anomalies
Database Anomalies
Insertion Anomaly
Cannot add data without having to add unrelated data first.
Update Anomaly
Must update the same data in multiple places, risking inconsistency.
Deletion Anomaly
Deleting a record causes loss of other valuable data.
Normal Forms
Normal forms are rules that define the level of normalization. Each normal form builds upon the previous one.
First Normal Form (1NF)
A table is in 1NF if:
- Each column contains atomic (indivisible) values
- Each column contains values of the same data type
- Each row is unique
- The order of rows and columns doesn't matter
Example: Violating 1NF
-- BAD: Multiple values in single column
Students Table:
+----+----------+---------------------------+
| id | name | subjects |
+----+----------+---------------------------+
| 1 | John Doe | Math, Science, English |
| 2 | Jane | History, Math |
+----+----------+---------------------------+
Example: Following 1NF
-- GOOD: Atomic values
Students Table:
+----+----------+
| id | name |
+----+----------+
| 1 | John Doe |
| 2 | Jane |
+----+----------+
Student_Subjects Table:
+------------+----------+
| student_id | subject |
+------------+----------+
| 1 | Math |
| 1 | Science |
| 1 | English |
| 2 | History |
| 2 | Math |
+------------+----------+
Second Normal Form (2NF)
A table is in 2NF if:
- It's in 1NF
- All non-key attributes are fully functionally dependent on the primary key
- No partial dependencies exist (applies to tables with composite primary keys)
Example: Violating 2NF
-- BAD: Partial dependency
Order_Items Table:
+----------+------------+----------+---------------+-------+
| order_id | product_id | quantity | product_name | price |
+----------+------------+----------+---------------+-------+
| 1 | 101 | 2 | Laptop | 999 |
| 1 | 102 | 1 | Mouse | 25 |
| 2 | 101 | 1 | Laptop | 999 |
+----------+------------+----------+---------------+-------+
-- Problem: product_name and price depend only on product_id, not the full key
Example: Following 2NF
-- GOOD: Separate tables
Products Table:
+------------+---------------+-------+
| product_id | product_name | price |
+------------+---------------+-------+
| 101 | Laptop | 999 |
| 102 | Mouse | 25 |
+------------+---------------+-------+
Order_Items Table:
+----------+------------+----------+
| order_id | product_id | quantity |
+----------+------------+----------+
| 1 | 101 | 2 |
| 1 | 102 | 1 |
| 2 | 101 | 1 |
+----------+------------+----------+
Third Normal Form (3NF)
A table is in 3NF if:
- It's in 2NF
- No transitive dependencies exist
- All non-key attributes depend directly on the primary key
Example: Violating 3NF
-- BAD: Transitive dependency
Employees Table:
+----+----------+--------+---------------+----------------+
| id | name | dept_id| department | dept_location |
+----+----------+--------+---------------+----------------+
| 1 | John | 10 | Sales | New York |
| 2 | Jane | 20 | Marketing | Chicago |
| 3 | Bob | 10 | Sales | New York |
+----+----------+--------+---------------+----------------+
-- Problem: dept_location depends on department, not directly on employee id
Example: Following 3NF
-- GOOD: Separate department information
Departments Table:
+---------+---------------+----------------+
| dept_id | department | dept_location |
+---------+---------------+----------------+
| 10 | Sales | New York |
| 20 | Marketing | Chicago |
+---------+---------------+----------------+
Employees Table:
+----+----------+---------+
| id | name | dept_id |
+----+----------+---------+
| 1 | John | 10 |
| 2 | Jane | 20 |
| 3 | Bob | 10 |
+----+----------+---------+
Boyce-Codd Normal Form (BCNF)
A stronger version of 3NF. A table is in BCNF if:
- It's in 3NF
- For every functional dependency A → B, A must be a superkey
Fourth Normal Form (4NF)
Deals with multi-valued dependencies. A table is in 4NF if:
- It's in BCNF
- No multi-valued dependencies exist
Practical Normalization Example
Unnormalized Table
-- BAD: Lots of redundancy and potential anomalies
Orders Table:
+----------+---------------+----------------+----------+---------------+-------+----------+
| order_id | customer_name | customer_email | product | product_price | qty | order_date|
+----------+---------------+----------------+----------+---------------+-------+----------+
| 1 | John Doe | john@email.com | Laptop | 999.99 | 1 | 2024-01-15|
| 1 | John Doe | john@email.com | Mouse | 25.99 | 2 | 2024-01-15|
| 2 | Jane Smith | jane@email.com | Laptop | 999.99 | 1 | 2024-01-16|
+----------+---------------+----------------+----------+---------------+-------+----------+
Normalized Tables (3NF)
-- GOOD: Normalized structure
Customers Table:
+-------------+---------------+----------------+
| customer_id | customer_name | customer_email |
+-------------+---------------+----------------+
| 1 | John Doe | john@email.com |
| 2 | Jane Smith | jane@email.com |
+-------------+---------------+----------------+
Products Table:
+------------+----------+-------+
| product_id | product | price |
+------------+----------+-------+
| 101 | Laptop | 999.99|
| 102 | Mouse | 25.99 |
+------------+----------+-------+
Orders Table:
+----------+-------------+------------+
| order_id | customer_id | order_date |
+----------+-------------+------------+
| 1 | 1 | 2024-01-15 |
| 2 | 2 | 2024-01-16 |
+----------+-------------+------------+
Order_Items Table:
+----------+------------+----------+
| order_id | product_id | quantity |
+----------+------------+----------+
| 1 | 101 | 1 |
| 1 | 102 | 2 |
| 2 | 101 | 1 |
+----------+------------+----------+
Creating Normalized Tables
-- Create normalized e-commerce database
CREATE TABLE customers (
customer_id INT PRIMARY KEY AUTO_INCREMENT,
first_name VARCHAR(50) NOT NULL,
last_name VARCHAR(50) NOT NULL,
email VARCHAR(100) UNIQUE NOT NULL,
phone VARCHAR(20),
address TEXT,
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);
CREATE TABLE categories (
category_id INT PRIMARY KEY AUTO_INCREMENT,
category_name VARCHAR(50) NOT NULL,
description TEXT
);
CREATE TABLE products (
product_id INT PRIMARY KEY AUTO_INCREMENT,
product_name VARCHAR(100) NOT NULL,
description TEXT,
price DECIMAL(10,2) NOT NULL,
stock_quantity INT DEFAULT 0,
category_id INT,
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
FOREIGN KEY (category_id) REFERENCES categories(category_id)
);
CREATE TABLE orders (
order_id INT PRIMARY KEY AUTO_INCREMENT,
customer_id INT NOT NULL,
order_date TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
status ENUM('pending', 'processing', 'shipped', 'delivered', 'cancelled') DEFAULT 'pending',
total_amount DECIMAL(10,2),
FOREIGN KEY (customer_id) REFERENCES customers(customer_id)
);
CREATE TABLE order_items (
order_id INT,
product_id INT,
quantity INT NOT NULL,
unit_price DECIMAL(10,2) NOT NULL,
PRIMARY KEY (order_id, product_id),
FOREIGN KEY (order_id) REFERENCES orders(order_id),
FOREIGN KEY (product_id) REFERENCES products(product_id)
);
When NOT to Normalize
Sometimes denormalization is acceptable or even beneficial:
- Read-heavy applications: Fewer JOINs improve query performance
- Data warehousing: Analytical queries benefit from denormalized structures
- Caching layers: Denormalized data for faster access
- Reporting tables: Pre-calculated aggregates for reports
Denormalization Example
-- Denormalized for reporting (acceptable in some cases)
CREATE TABLE monthly_sales_report (
report_id INT PRIMARY KEY AUTO_INCREMENT,
year INT,
month INT,
customer_id INT,
customer_name VARCHAR(100), -- Denormalized
customer_email VARCHAR(100), -- Denormalized
total_orders INT,
total_amount DECIMAL(10,2),
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);
Normalization Best Practices
- Start with 3NF: Most applications benefit from third normal form
- Consider performance: Balance normalization with query performance
- Use meaningful names: Clear table and column names
- Document relationships: Maintain clear documentation of table relationships
- Regular review: Periodically review and optimize database structure
Tools for Database Design
- ER Diagram tools: MySQL Workbench, dbdiagram.io, Lucidchart
- Database modeling: ERwin, PowerDesigner
- Online tools: Draw.io, Creately
Practice Exercises
- Identify the normal form violations in a given unnormalized table
- Normalize a library database (books, authors, members, loans)
- Design a normalized database for a school management system
- Create an ER diagram for an online marketplace
Next Steps
In the next lesson, we'll explore indexes and query optimization techniques to improve database performance.