Database Intermediate

Aggregate Functions and GROUP BY: Analyzing Data

CodingerWeb
CodingerWeb
20 views 45 min read

Introduction to Aggregate Functions

Aggregate functions perform calculations on multiple rows and return a single result. They're essential for data analysis and reporting.

Common Aggregate Functions

COUNT() - Counting Records

-- Count all customers
SELECT COUNT(*) AS total_customers FROM customers;

-- Count customers with phone numbers
SELECT COUNT(phone) AS customers_with_phone FROM customers;

-- Count unique cities
SELECT COUNT(DISTINCT city) AS unique_cities FROM customers;

SUM() - Adding Values

-- Total sales amount
SELECT SUM(total_amount) AS total_sales FROM orders;

-- Sum of product prices in category 1
SELECT SUM(price) AS category_total FROM products WHERE category_id = 1;

AVG() - Average Values

-- Average order amount
SELECT AVG(total_amount) AS average_order FROM orders;

-- Average product price
SELECT AVG(price) AS average_price FROM products;

MIN() and MAX() - Minimum and Maximum

-- Cheapest and most expensive products
SELECT 
    MIN(price) AS cheapest_price,
    MAX(price) AS most_expensive_price
FROM products;

-- Earliest and latest order dates
SELECT 
    MIN(order_date) AS first_order,
    MAX(order_date) AS latest_order
FROM orders;

GROUP BY Clause

GROUP BY groups rows with the same values and allows aggregate functions to be applied to each group.

Basic GROUP BY

-- Count customers by city
SELECT city, COUNT(*) AS customer_count
FROM customers
GROUP BY city;

Multiple Columns in GROUP BY

-- Count customers by city and state
SELECT city, state, COUNT(*) AS customer_count
FROM customers
GROUP BY city, state
ORDER BY state, city;

HAVING Clause

HAVING filters groups after GROUP BY is applied (WHERE filters before grouping).

-- Cities with more than 5 customers
SELECT city, COUNT(*) AS customer_count
FROM customers
GROUP BY city
HAVING COUNT(*) > 5;

Practical Examples

Sales Analysis

-- Monthly sales summary
SELECT 
    YEAR(order_date) AS year,
    MONTH(order_date) AS month,
    COUNT(*) AS total_orders,
    SUM(total_amount) AS total_sales,
    AVG(total_amount) AS average_order_value
FROM orders
GROUP BY YEAR(order_date), MONTH(order_date)
ORDER BY year, month;

Customer Analysis

-- Customer order statistics
SELECT 
    c.customer_id,
    c.first_name,
    c.last_name,
    COUNT(o.order_id) AS total_orders,
    SUM(o.total_amount) AS total_spent,
    AVG(o.total_amount) AS average_order_value,
    MAX(o.order_date) AS last_order_date
FROM customers c
LEFT JOIN orders o ON c.customer_id = o.customer_id
GROUP BY c.customer_id, c.first_name, c.last_name
ORDER BY total_spent DESC;

Product Performance

-- Best selling products
SELECT 
    p.product_name,
    SUM(oi.quantity) AS total_sold,
    SUM(oi.quantity * oi.unit_price) AS total_revenue
FROM products p
JOIN order_items oi ON p.product_id = oi.product_id
GROUP BY p.product_id, p.product_name
ORDER BY total_revenue DESC
LIMIT 10;

Advanced Grouping Techniques

Conditional Aggregation

-- Count orders by status
SELECT 
    COUNT(CASE WHEN status = 'pending' THEN 1 END) AS pending_orders,
    COUNT(CASE WHEN status = 'processing' THEN 1 END) AS processing_orders,
    COUNT(CASE WHEN status = 'shipped' THEN 1 END) AS shipped_orders,
    COUNT(CASE WHEN status = 'delivered' THEN 1 END) AS delivered_orders
FROM orders;

Percentage Calculations

-- Customer distribution by city (with percentages)
SELECT 
    city,
    COUNT(*) AS customer_count,
    ROUND(COUNT(*) * 100.0 / (SELECT COUNT(*) FROM customers), 2) AS percentage
FROM customers
GROUP BY city
ORDER BY customer_count DESC;

Date-based Grouping

-- Daily sales for current month
SELECT 
    DATE(order_date) AS order_day,
    COUNT(*) AS orders_count,
    SUM(total_amount) AS daily_sales
FROM orders
WHERE MONTH(order_date) = MONTH(CURRENT_DATE())
  AND YEAR(order_date) = YEAR(CURRENT_DATE())
GROUP BY DATE(order_date)
ORDER BY order_day;

Combining WHERE and HAVING

-- High-value customers from specific cities
SELECT 
    city,
    COUNT(*) AS customer_count,
    AVG(total_spent) AS avg_spent_per_customer
FROM (
    SELECT 
        c.city,
        c.customer_id,
        SUM(o.total_amount) AS total_spent
    FROM customers c
    JOIN orders o ON c.customer_id = o.customer_id
    WHERE c.city IN ('New York', 'Los Angeles', 'Chicago')
    GROUP BY c.city, c.customer_id
) customer_spending
GROUP BY city
HAVING AVG(total_spent) > 500;

NULL Handling in Aggregates

-- Aggregate functions ignore NULL values
SELECT 
    COUNT(*) AS total_customers,           -- Counts all rows
    COUNT(phone) AS customers_with_phone,  -- Counts non-NULL phone values
    COUNT(DISTINCT city) AS unique_cities  -- Counts unique non-NULL cities
FROM customers;

Window Functions Preview

For more advanced analytics, you can use window functions (available in newer MySQL/MariaDB versions):

-- Running total of sales
SELECT 
    order_date,
    total_amount,
    SUM(total_amount) OVER (ORDER BY order_date) AS running_total
FROM orders
ORDER BY order_date;

Performance Tips

  • Index GROUP BY columns: Create indexes on frequently grouped columns
  • Limit result sets: Use LIMIT when you don't need all groups
  • Use HAVING efficiently: Filter as much as possible with WHERE before grouping
  • Consider subqueries: Sometimes breaking complex queries into steps improves performance

Common Mistakes to Avoid

  • Selecting non-grouped columns: All SELECT columns must be in GROUP BY or be aggregate functions
  • Using WHERE with aggregates: Use HAVING instead of WHERE for aggregate conditions
  • Forgetting ORDER BY: GROUP BY doesn't guarantee order; use ORDER BY for sorted results

Practice Exercises

  1. Find the total number of orders and total sales amount for each customer
  2. Calculate the average price of products in each category
  3. Find cities with more than 10 customers and their average order value
  4. Create a monthly sales report showing growth compared to the previous month

Next Steps

In the next lesson, we'll explore JOIN operations to combine data from multiple tables and create more comprehensive queries.