📚 Advanced SQL Cheat Sheet: Aggregate Functions, GROUP BY, JOINs & Subqueries
Aggregate Function Best Practices: Summary Stats and NULL Handling

If you have mastered the basic SELECT, FROM, and WHERE statements, congratulations! You have taken the first step into the world of databases. However, to truly analyze data and extract meaningful business insights, you need to go beyond simple data retrieval.
In this extended guide, we will dive deep into advanced SQL concepts: Aggregate Functions, Grouping, JOINs, Subqueries, and Data Modification. Let's level up your SQL skills! 🚀
1. Aggregate Functions (The Power of Summarization)
Unlike regular functions that operate on a single row, Aggregate Functions take a collection of rows and return a single statistical value. This is the absolute foundation of data analysis.
Top 5 Essential Aggregate Functions:
COUNT(column): Returns the number of non-null values in a column. Works with all data types.COUNT(*): Calculates the total number of rows in a table. Note: This is the only function that includesNULLvalues in its count.MAX(column)/MIN(column): Finds the highest or lowest value. It works on numbers, alphabetical strings (dictionary order), and dates (latest/oldest).SUM(column)/AVG(column): Calculates the total sum or the average. Since these require arithmetic, they can only be used on numeric data types.
⚠️ Crucial Rules to Remember:
The NULL Rule: Except for
COUNT(*), all aggregate functions completely ignoreNULLvalues. For example, if out of 5 employees, one has aNULLbonus and the others have 100,AVG(bonus)will calculate as 400/4 = 100 (it does not treat the NULL as a 0).Placement Restriction: You cannot use aggregate functions in a
WHEREclause. They can only be used in theSELECTclause or theHAVINGclause (after data has been grouped).Pro Tip (Removing Duplicates): Want to count the number of unique categories? Put the
DISTINCTkeyword inside the parenthesis:
SELECT COUNT(DISTINCT department_name) FROM employees;
2. Grouping Data (GROUP BY & HAVING)
Sometimes you don't want an average for the entire company; you want the average per department. This is where grouping comes in.
GROUP BY: Splits the rows into smaller groups based on identical values in specified columns. You can group by multiple columns separated by commas. Golden Rule of GROUP BY: Any column included in the SELECT clause must either be listed in the GROUP BY clause or be wrapped in an aggregate function.HAVING: This is the filtering condition for groups. WhileWHEREfilters individual rows before grouping,HAVINGfilters the group results after grouping.
🧠 The Logical Execution Order of SQL
Understanding how the database engine processes your query will save you from countless syntax errors:
FROM: Choose the table(s).WHERE: Filter individual rows out.GROUP BY: Group the remaining rows.HAVING: Filter the groups using aggregate conditions.SELECT: Calculate final aggregates and select columns to display.ORDER BY: Sort the final result set.
💻 Practical Example:
-- Find departments where the average salary is 300 or higher,
-- but only consider employees making at least 200.
SELECT department_name, AVG(salary) AS avg_salary
FROM employees
WHERE salary >= 200 -- 1st: Filter individual employees
GROUP BY department_name -- 2nd: Create groups by department
HAVING AVG(salary) >= 300 -- 3rd: Filter the created groups
ORDER BY avg_salary DESC; -- 4th: Sort the final output
3. Mastering JOINs
In a properly normalized Relational Database (RDBMS), data is scattered across multiple tables to prevent duplication. A JOIN is the essential technique to connect these scattered pieces back together into a meaningful picture.
Join Attribute: Tables are logically connected using a common link, typically a Primary Key (PK) from one table and a matching Foreign Key (FK) in another. The column names don't have to match, but their data types (domains) must be identical.
How to Write a JOIN:
1. The Traditional Way (Implicit Join): Uses the WHERE clause to link tables.
SELECT orders.quantity, customers.name
FROM orders, customers
WHERE orders.customer_id = customers.id;
2. The Modern Standard (ANSI INNER JOIN): Highly recommended! It separates the "join logic" from the "filtering logic," making your code much easier to read.
( Note on Outer Joins: While INNER JOIN finds exact matches, LEFT JOIN is critical in real-world scenarios for finding missing or unmatched data, e.g., finding customers who haven't placed any orders. )
SELECT o.quantity, c.name
FROM orders o
INNER JOIN customers c ON o.customer_id = c.id;
💡 Table Aliases: Notice orders o and customers c? When joining tables, using short aliases prevents column name collisions and saves you a lot of typing!
4. Subqueries (Query within a Query)
When a single query isn't enough (e.g., "Find the employee who makes the highest salary"), you can nest a query inside another query.
Basic Rules:
Subqueries must be enclosed in parentheses
().You cannot use
ORDER BYinside a subquery. Sorting is only for the final main query.Execution: For uncorrelated subqueries, the database runs the inner subquery first, takes its result, and plugs it into the outer main query. (Note: Correlated subqueries also exist, which execute row-by-row and evaluate once for each row processed by the outer query).
Where Can You Put Them?
Nested Subquery: Located in the
WHEREclause (used for filtering).Inline View: Located in the
FROMclause (acts as a temporary, virtual table).Scalar Subquery: Located in the
SELECTclause (must return exactly one single value).
Operators for Subqueries (in WHERE clauses):
- Single-Row Output: If the subquery returns exactly one value, use standard operators:
=,>,<,>=,<=,<>.
SELECT name FROM employees WHERE salary = (SELECT MAX(salary) FROM employees);
Multi-Row Output: If the subquery returns a list of values, you must use special operators:
IN/NOT IN: Matches any value in the list.EXISTS/NOT EXISTS: Checks if the subquery returns at least one row (Returns True/False). Extremely fast for performance! Because the database engine stops searching (short-circuits) the moment it finds the first match, rather than scanning the entire result set.ALL: The condition must be true for every value returned.ANY(orSOME): The condition must be true for at least one value returned.
5. Modifying & Deleting Data (Proceed with Caution! ⚠️)
Data Manipulation Language (DML) commands actually change the data inside your database. In a real-world environment, a careless mistake here can cause massive damage.
1) UPDATE
Changes existing data. You can apply math operations or even use subqueries.
UPDATE products
SET price = price * 1.1, -- Increase price by 10%
status = 'Price Updated' -- Use commas for multiple columns
WHERE category = 'Beverage'; -- 🚨 DANGER: If you forget WHERE, ALL rows are updated!
2) DELETE
Removes entire rows of data (not just specific columns).
DELETE FROM order_history
WHERE order_date < '2022-01-01';
🔥 Interview Favorite: DELETE vs TRUNCATE vs DROP
Understanding the difference between these three deletion methods is critical for both practical work and certification exams:
1) DELETE FROM table; (DML)
Removes rows one by one.
Writes to the transaction log, meaning you can Rollback (undo) the deletion if you make a mistake.
Relatively slow. The empty table structure remains.
2) TRUNCATE TABLE table; (DDL)
Wipes out all data instantly.
Does not write individual row deletions to the log, meaning Rollback is impossible. (Pro-Tip: While TRUNCATE cannot be rolled back in MySQL or Oracle as it forces an auto-commit, it actually can be rolled back if executed inside an explicit transaction block in SQL Server and PostgreSQL.)
Much faster than
DELETE. The empty table structure remains, and Auto-Increment counters are reset to zero.
3) DROP TABLE table; (DDL)
Nuclear option. ☢️
Destroys the data, the table structure (schema), indexes, and constraints completely.
Rollback is impossible. The table ceases to exist in the database.
By mastering these advanced concepts, you transition from someone who can just "read" a database to an analyst who can manipulate, summarize, and extract true value from raw data. Happy querying!





