Database Intermediate

Indexes and Performance Optimization

CodingerWeb
CodingerWeb
22 views 50 min read

Understanding Database Indexes

Database indexes are data structures that improve the speed of data retrieval operations. Think of them like an index in a book - they help you quickly find specific information without scanning every page.

How Indexes Work

Indexes create a separate structure that points to the actual data rows. When you query with indexed columns, the database can quickly locate the relevant rows instead of scanning the entire table.

Types of Indexes

1. Primary Index

Automatically created for primary key columns. Ensures uniqueness and fast access.

CREATE TABLE customers (
    customer_id INT PRIMARY KEY AUTO_INCREMENT,  -- Automatically indexed
    first_name VARCHAR(50),
    last_name VARCHAR(50),
    email VARCHAR(100)
);

2. Unique Index

Ensures uniqueness and provides fast lookups for unique columns.

-- Create unique index on email column
CREATE UNIQUE INDEX idx_customer_email ON customers(email);

-- Or during table creation
CREATE TABLE customers (
    customer_id INT PRIMARY KEY AUTO_INCREMENT,
    email VARCHAR(100) UNIQUE  -- Automatically creates unique index
);

3. Regular (Non-Unique) Index

Improves query performance without enforcing uniqueness.

-- Create index on frequently searched columns
CREATE INDEX idx_customer_last_name ON customers(last_name);
CREATE INDEX idx_order_date ON orders(order_date);
CREATE INDEX idx_product_category ON products(category_id);

4. Composite Index

Index on multiple columns, useful for queries that filter on multiple columns.

-- Composite index for queries filtering by both columns
CREATE INDEX idx_customer_name ON customers(last_name, first_name);

-- Good for queries like:
-- SELECT * FROM customers WHERE last_name = 'Smith' AND first_name = 'John';

5. Partial Index

Index only rows that meet certain conditions (MySQL 8.0+).

-- Index only active customers
CREATE INDEX idx_active_customers ON customers(last_name) 
WHERE status = 'active';

Creating and Managing Indexes

Creating Indexes

-- Basic syntax
CREATE INDEX index_name ON table_name(column_name);

-- Multiple columns
CREATE INDEX idx_order_customer_date ON orders(customer_id, order_date);

-- Specify index type (optional)
CREATE INDEX idx_product_name ON products(product_name) USING BTREE;

Viewing Indexes

-- Show indexes for a table
SHOW INDEXES FROM customers;

-- Or using INFORMATION_SCHEMA
SELECT 
    TABLE_NAME,
    INDEX_NAME,
    COLUMN_NAME,
    NON_UNIQUE
FROM INFORMATION_SCHEMA.STATISTICS 
WHERE TABLE_SCHEMA = 'your_database_name'
ORDER BY TABLE_NAME, INDEX_NAME;

Dropping Indexes

-- Drop an index
DROP INDEX idx_customer_last_name ON customers;

-- Alternative syntax
ALTER TABLE customers DROP INDEX idx_customer_last_name;

When to Use Indexes

Good Candidates for Indexing

  • Primary keys: Automatically indexed
  • Foreign keys: Improve JOIN performance
  • Frequently searched columns: WHERE clause columns
  • ORDER BY columns: Sorting operations
  • GROUP BY columns: Grouping operations
  • Unique constraints: Email, username, etc.
-- Examples of good indexing candidates
CREATE INDEX idx_orders_customer_id ON orders(customer_id);  -- Foreign key
CREATE INDEX idx_products_price ON products(price);         -- Range queries
CREATE INDEX idx_orders_status ON orders(status);           -- Filtering
CREATE INDEX idx_customers_city ON customers(city);         -- Grouping

When NOT to Index

  • Small tables: Full table scan might be faster
  • Frequently updated columns: Indexes slow down INSERT/UPDATE/DELETE
  • Low selectivity columns: Columns with few unique values
  • Wide columns: Large text or binary columns

Query Optimization Techniques

1. Use EXPLAIN to Analyze Queries

-- Analyze query execution plan
EXPLAIN SELECT * FROM customers WHERE last_name = 'Smith';

-- More detailed analysis
EXPLAIN FORMAT=JSON SELECT 
    c.first_name, c.last_name, o.order_date
FROM customers c
JOIN orders o ON c.customer_id = o.customer_id
WHERE c.city = 'New York';

2. Optimize WHERE Clauses

-- GOOD: Use indexed columns in WHERE
SELECT * FROM customers WHERE customer_id = 123;

-- BAD: Functions on indexed columns prevent index usage
SELECT * FROM customers WHERE UPPER(last_name) = 'SMITH';

-- GOOD: Store data in the format you'll search
SELECT * FROM customers WHERE last_name = 'Smith';

3. Optimize JOIN Operations

-- Ensure JOIN columns are indexed
CREATE INDEX idx_orders_customer_id ON orders(customer_id);
CREATE INDEX idx_order_items_order_id ON order_items(order_id);
CREATE INDEX idx_order_items_product_id ON order_items(product_id);

-- Efficient JOIN query
SELECT 
    c.first_name,
    c.last_name,
    COUNT(o.order_id) as order_count
FROM customers c
LEFT JOIN orders o ON c.customer_id = o.customer_id
GROUP BY c.customer_id, c.first_name, c.last_name;

4. Use LIMIT for Large Result Sets

-- GOOD: Limit results when you don't need all rows
SELECT * FROM products ORDER BY created_at DESC LIMIT 20;

-- Add pagination for better user experience
SELECT * FROM products 
ORDER BY created_at DESC 
LIMIT 20 OFFSET 40;  -- Page 3 (20 per page)

5. Optimize Subqueries

-- SLOW: Correlated subquery
SELECT * FROM customers c
WHERE EXISTS (
    SELECT 1 FROM orders o 
    WHERE o.customer_id = c.customer_id
);

-- FASTER: Use JOIN instead
SELECT DISTINCT c.*
FROM customers c
INNER JOIN orders o ON c.customer_id = o.customer_id;

Advanced Optimization Techniques

1. Covering Indexes

Include all columns needed by a query in the index.

-- Query that needs customer_id, last_name, and email
SELECT customer_id, last_name, email 
FROM customers 
WHERE last_name = 'Smith';

-- Create covering index
CREATE INDEX idx_covering_customer ON customers(last_name, customer_id, email);

2. Index Prefix

For long string columns, index only the first few characters.

-- Index first 10 characters of description
CREATE INDEX idx_product_desc_prefix ON products(description(10));

3. Query Rewriting

-- SLOW: OR conditions often can't use indexes efficiently
SELECT * FROM products 
WHERE category_id = 1 OR category_id = 2;

-- FASTER: Use UNION or IN
SELECT * FROM products 
WHERE category_id IN (1, 2);

-- Or for complex conditions
SELECT * FROM products WHERE category_id = 1
UNION
SELECT * FROM products WHERE category_id = 2;

Monitoring and Maintenance

1. Monitor Slow Queries

-- Enable slow query log (in MySQL configuration)
-- slow_query_log = 1
-- long_query_time = 2

-- Find slow queries
SELECT 
    query_time,
    lock_time,
    rows_sent,
    rows_examined,
    sql_text
FROM mysql.slow_log
ORDER BY query_time DESC
LIMIT 10;

2. Analyze Table Statistics

-- Update table statistics for better query planning
ANALYZE TABLE customers;
ANALYZE TABLE orders;
ANALYZE TABLE products;

-- Check table information
SELECT 
    table_name,
    table_rows,
    avg_row_length,
    data_length,
    index_length
FROM information_schema.tables
WHERE table_schema = 'your_database_name';

3. Index Maintenance

-- Check for unused indexes
SELECT 
    s.table_schema,
    s.table_name,
    s.index_name,
    s.column_name
FROM information_schema.statistics s
LEFT JOIN information_schema.index_statistics i 
    ON s.table_schema = i.table_schema 
    AND s.table_name = i.table_name 
    AND s.index_name = i.index_name
WHERE i.index_name IS NULL
    AND s.table_schema = 'your_database_name';

Performance Best Practices

  • Index foreign keys: Always index columns used in JOINs
  • Monitor query performance: Use EXPLAIN regularly
  • Avoid over-indexing: Too many indexes slow down writes
  • Keep statistics updated: Run ANALYZE TABLE periodically
  • Use appropriate data types: Smaller data types are faster
  • Normalize appropriately: Balance normalization with performance

Common Performance Pitfalls

  • SELECT *: Only select columns you need
  • Missing indexes on WHERE/JOIN columns
  • Functions in WHERE clauses: Prevent index usage
  • Implicit data type conversions
  • Unnecessary GROUP BY/ORDER BY operations

Practical Example: E-commerce Optimization

-- Create optimized indexes for e-commerce queries
CREATE INDEX idx_products_category_price ON products(category_id, price);
CREATE INDEX idx_orders_customer_date ON orders(customer_id, order_date);
CREATE INDEX idx_orders_status_date ON orders(status, order_date);
CREATE INDEX idx_customers_email ON customers(email);
CREATE INDEX idx_customers_city_state ON customers(city, state);

-- Optimized query for product search
SELECT 
    p.product_id,
    p.product_name,
    p.price,
    c.category_name
FROM products p
INNER JOIN categories c ON p.category_id = c.category_id
WHERE p.category_id = 1 
    AND p.price BETWEEN 50 AND 200
    AND p.stock_quantity > 0
ORDER BY p.price
LIMIT 20;

Practice Exercises

  1. Analyze a slow query using EXPLAIN and create appropriate indexes
  2. Design indexes for a blog database (posts, comments, users, categories)
  3. Optimize a complex reporting query with multiple JOINs
  4. Create a covering index for a frequently-used query

Next Steps

In the next lesson, we'll explore stored procedures and functions to encapsulate business logic in the database.