Series · Databases · Chapter 2

Databases (2): Indexing and Query Planning — How Databases Find Your Data

Deep dive into B-tree and B+tree indexes, hash indexes, composite indexes, covering indexes, and how to read EXPLAIN output to diagnose slow queries.

A query that returns in 2 milliseconds on your laptop with 1,000 rows will take 45 seconds on a production database with 50 million rows — unless you have the right indexes. Indexes are the single most impactful performance tool in your database toolkit, and understanding how they work changes the way you think about every schema and every query you write.


The Fundamental Problem: Finding a Row#

Imagine a table with 10 million rows, stored on disk as a heap file. Each row sits somewhere in a sequence of 8 KB pages. When you run:

Index selectivity impact

1
SELECT * FROM users WHERE email = 'alice@example.com';

Without an index, the database must perform a sequential scan (also called a full table scan): read every single page, examine every single row, check if email matches. If the table occupies 2 GB on disk, the database reads 2 GB. For one row.

An index is a separate data structure that maps column values to row locations. With a B-tree index on email, the same lookup touches perhaps 3-4 pages instead of 250,000. That is the difference between milliseconds and minutes.

Sequential Scan vs Index Scan#

AspectSequential ScanIndex Scan
How it worksReads every page in table orderTraverses index tree, then fetches matching rows
Best forSmall tables, queries returning >10-15% of rowsSelective queries returning few rows
I/O patternSequential (fast on HDDs)Random (each row may be on a different page)
CPU costLow per-row (just filter)Higher per-row (tree traversal + heap fetch)
When chosenNo suitable index, or optimizer estimates scan is cheaperSuitable index exists and query is selective

Index scan vs sequential scan

The database’s query optimizer makes this decision automatically. Sometimes a sequential scan is faster — for example, when your WHERE clause matches 80% of the table, random I/O from an index would be slower than just reading everything sequentially.

B-Tree Index: The Workhorse#

The B-tree (balanced tree) is the default index type in virtually every relational database. Here is how it works.

B-tree index structure

Structure#

A B-tree is a self-balancing tree where:

  • Each node contains multiple keys in sorted order
  • Each internal node has pointers to child nodes between and around keys
  • All leaf nodes are at the same depth (balanced)
  • The branching factor (number of children per node) is typically hundreds or thousands

For a table with 10 million rows and a branching factor of 500:

  • Level 0 (root): 1 node
  • Level 1: up to 500 nodes
  • Level 2: up to 250,000 nodes
  • Level 3 (leaves): up to 125 million entries

Three levels of tree traversal — three page reads — to find any row among 10 million. That is the O(log N) guarantee, but with a very large base logarithm.

How a Lookup Works#

To find email = 'alice@example.com':

  1. Start at the root node. Binary search through keys to find which child pointer to follow.
  2. Load the child node. Binary search again.
  3. Repeat until you reach a leaf node.
  4. The leaf contains a pointer to the actual row on disk (a tuple ID or row ID).
  5. Fetch the row from the heap (the main table data).
1
2
3
4
5
6
7
Root Node: [charlie@... | mike@... | zara@...]
                |              |           |
          Child < charlie  charlie-mike  mike-zara   > zara
                |
    [alice@... | bob@...]  <-- leaf node
         |
    Pointer to heap page 4721, offset 23

Creating B-Tree Indexes#

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
-- Single-column index
CREATE INDEX idx_users_email ON users (email);

-- Unique index (also enforces uniqueness constraint)
CREATE UNIQUE INDEX idx_users_email_unique ON users (email);

-- Check existing indexes (PostgreSQL)
SELECT indexname, indexdef
FROM pg_indexes
WHERE tablename = 'users';

-- Check existing indexes (MySQL)
SHOW INDEX FROM users;

B+Tree: Why Databases Prefer It#

Btree index structure as a futuristic city skyline branching

Most database implementations actually use a B+tree, a variation of the B-tree:

FeatureB-treeB+tree
Data pointersIn both internal and leaf nodesOnly in leaf nodes
Leaf nodes linkedNoYes, via sibling pointers
Internal node sizeLarger (stores data pointers)Smaller (keys + child pointers only)
Branching factorLowerHigher (more keys fit per node)
Range queriesRequires traversing back up the treeFollow leaf pointers
Point lookupsCan terminate early at internal nodesAlways goes to leaf level

The key advantage: because internal nodes only store keys (not data pointers), more keys fit per page, increasing the branching factor. A higher branching factor means a shallower tree, which means fewer disk reads.

The linked leaf nodes are critical for range queries:

1
2
3
4
-- Range query: find orders from the last 7 days
SELECT * FROM orders
WHERE created_at >= NOW() - INTERVAL '7 days'
ORDER BY created_at;

With a B+tree index on created_at, the database traverses to the first matching leaf, then follows sibling pointers to read all matching entries sequentially — no need to revisit internal nodes.

Hash Indexes#

Hash indexes use a hash function to map keys directly to row locations.

Hash index structure

1
2
-- PostgreSQL: explicitly create a hash index
CREATE INDEX idx_users_email_hash ON users USING hash (email);
AspectHash IndexB-Tree Index
Equality lookups (=)O(1) averageO(log N)
Range queries (>, <, BETWEEN)Not supportedSupported
Sorting / ORDER BYNot supportedSupported
Prefix matching (LIKE 'abc%')Not supportedSupported
WAL-logged (crash-safe)PostgreSQL 10+Always
Common usageRarely used in practiceDefault, almost always preferred

Hash indexes win on pure equality lookups but lose everywhere else. In practice, B-tree indexes are fast enough for equality lookups that the limited functionality of hash indexes is rarely worth it. PostgreSQL did not even make hash indexes crash-safe until version 10.

Composite Indexes: Column Order Matters#

A composite (multi-column) index indexes multiple columns together:

1
CREATE INDEX idx_orders_user_status ON orders (user_id, status);

This creates a B+tree sorted first by user_id, then by status within each user_id. Think of it like a phone book sorted by last name, then first name.

The Leftmost Prefix Rule#

A composite index on (a, b, c) can satisfy queries that filter on:

Query filters onUses index?Why
aYesLeftmost prefix
a, bYesLeftmost prefix
a, b, cYesFull index
bNoSkips leftmost column
b, cNoSkips leftmost column
a, cPartiallyUses a, then scans for c
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
-- This query uses the composite index efficiently
SELECT * FROM orders
WHERE user_id = 42 AND status = 'completed';

-- This query can use the index (leftmost prefix)
SELECT * FROM orders
WHERE user_id = 42;

-- This query CANNOT use the composite index
-- It needs a separate index on (status)
SELECT * FROM orders
WHERE status = 'pending';

Column order strategy: put the most selective column first (the one that filters out the most rows), followed by columns commonly used together.

Covering Indexes and Index-Only Scans#

Normally, an index scan involves two steps:

  1. Traverse the index to find matching entries
  2. Fetch the actual rows from the heap (table data) to get the remaining columns

Step 2 is called a “heap fetch” and involves random I/O. A covering index includes all columns needed by the query, eliminating the heap fetch entirely:

1
2
3
4
5
6
7
8
9
-- Create a covering index
CREATE INDEX idx_orders_covering ON orders (user_id, status)
INCLUDE (created_at, order_id);

-- This query can be satisfied entirely from the index
-- No heap fetch needed = "index-only scan"
SELECT order_id, status, created_at
FROM orders
WHERE user_id = 42 AND status = 'completed';

In PostgreSQL, you use the INCLUDE clause for non-searchable but covered columns. In MySQL (InnoDB), covering indexes work because InnoDB’s secondary indexes can cover queries if all needed columns are in the index.

1
2
3
4
5
6
7
-- MySQL covering index
CREATE INDEX idx_orders_covering ON orders (user_id, status, created_at, order_id);

-- Check if index-only scan is used (MySQL)
EXPLAIN SELECT order_id, status, created_at
FROM orders WHERE user_id = 42 AND status = 'completed';
-- Look for "Using index" in Extra column

EXPLAIN: Reading the Query Plan#

EXPLAIN shows you the execution plan the optimizer chose. EXPLAIN ANALYZE actually runs the query and shows real timing.

Query cost model

PostgreSQL EXPLAIN ANALYZE#

1
2
3
4
5
6
7
8
EXPLAIN ANALYZE
SELECT u.full_name, COUNT(*) AS order_count
FROM users u
JOIN orders o ON u.user_id = o.user_id
WHERE o.status = 'completed'
GROUP BY u.full_name
ORDER BY order_count DESC
LIMIT 10;

Output:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
 Limit  (cost=1845.23..1845.26 rows=10 width=40) (actual time=12.456..12.461 rows=10 loops=1)
   ->  Sort  (cost=1845.23..1857.45 rows=4889 width=40) (actual time=12.454..12.458 rows=10 loops=1)
         Sort Key: (count(*)) DESC
         Sort Method: top-N heapsort  Memory: 25kB
         ->  HashAggregate  (cost=1723.56..1772.45 rows=4889 width=40) (actual time=11.234..11.890 rows=4889 loops=1)
               Group Key: u.full_name
               Batches: 1  Memory Usage: 913kB
               ->  Hash Join  (cost=234.67..1601.23 rows=24450 width=32) (actual time=2.345..8.901 rows=24450 loops=1)
                     Hash Cond: (o.user_id = u.user_id)
                     ->  Seq Scan on orders o  (cost=0.00..1156.00 rows=24450 width=4) (actual time=0.012..4.567 rows=24450 loops=1)
                           Filter: ((status)::text = 'completed'::text)
                           Rows Removed by Filter: 25550
                     ->  Hash  (cost=159.00..159.00 rows=10000 width=36) (actual time=2.123..2.123 rows=10000 loops=1)
                           Buckets: 16384  Batches: 1  Memory Usage: 641kB
                           ->  Seq Scan on users u  (cost=0.00..159.00 rows=10000 width=36) (actual time=0.008..0.987 rows=10000 loops=1)
 Planning Time: 0.234 ms
 Execution Time: 12.567 ms

Key things to look for:

FieldWhat it tells you
Seq ScanFull table scan — possibly needs an index
Index ScanUsing an index — good for selective queries
Index Only ScanCovering index — best case
Bitmap Index ScanCombines multiple index results
Hash Join / Nested Loop / Merge JoinJoin strategy
actual timeReal execution time (first row..last row) in ms
rowsActual number of rows processed
Rows Removed by FilterRows read but discarded — high numbers suggest missing index
loopsHow many times this step ran

MySQL EXPLAIN#

1
2
3
4
5
6
7
8
EXPLAIN
SELECT u.full_name, COUNT(*) AS order_count
FROM users u
JOIN orders o ON u.user_id = o.user_id
WHERE o.status = 'completed'
GROUP BY u.full_name
ORDER BY order_count DESC
LIMIT 10;
1
2
3
4
5
6
+----+-------------+-------+------+------------------+---------+---------+------------------+-------+----------------------------------------------+
| id | select_type | table | type | possible_keys    | key     | key_len | ref              | rows  | Extra                                        |
+----+-------------+-------+------+------------------+---------+---------+------------------+-------+----------------------------------------------+
|  1 | SIMPLE      | o     | ref  | idx_order_status | idx_... | 82      | const            | 24450 | Using where; Using temporary; Using filesort |
|  1 | SIMPLE      | u     | ref  | PRIMARY          | PRIMARY | 4       | mydb.o.user_id   |     1 | NULL                                         |
+----+-------------+-------+------+------------------+---------+---------+------------------+-------+----------------------------------------------+

The type column is the most important in MySQL EXPLAIN:

typeMeaningPerformance
system / constAt most one matching rowBest
eq_refOne row per join (primary key / unique)Excellent
refMultiple rows via non-unique indexGood
rangeIndex range scanGood
indexFull index scan (reads all index entries)Moderate
ALLFull table scanWorst — usually needs an index

Spotting Problems#

Here is a bad query plan (PostgreSQL):

1
2
3
4
EXPLAIN ANALYZE
SELECT * FROM orders
WHERE status = 'pending'
  AND created_at > '2023-01-01';
1
2
3
4
5
 Seq Scan on orders  (cost=0.00..2456.00 rows=245 width=48) (actual time=0.034..18.567 rows=234 loops=1)
   Filter: (((status)::text = 'pending'::text) AND (created_at > '2023-01-01'))
   Rows Removed by Filter: 49766
 Planning Time: 0.089 ms
 Execution Time: 18.623 ms

Red flags:

  • Seq Scan on a table with 50,000 rows
  • Rows Removed by Filter: 49766 — read 50,000 rows to find 234

Fix:

1
CREATE INDEX idx_orders_status_created ON orders (status, created_at);

After creating the index:

1
2
3
4
 Index Scan using idx_orders_status_created on orders  (cost=0.29..12.45 rows=245 width=48) (actual time=0.023..0.189 rows=234 loops=1)
   Index Cond: (((status)::text = 'pending'::text) AND (created_at > '2023-01-01'))
 Planning Time: 0.102 ms
 Execution Time: 0.234 ms

From 18.6 ms to 0.23 ms — an 80x improvement — from one index.

Index Selection Strategies#

What to Index#

  1. Primary keys: automatically indexed in all databases
  2. Foreign keys: always index these — JOINs use them constantly
  3. Columns in WHERE clauses: especially in high-frequency queries
  4. Columns in ORDER BY: avoids expensive sorts
  5. Columns in GROUP BY: helps with aggregation
  6. Columns used in JOINs: besides foreign keys, any join condition

Cardinality Matters#

Cardinality = the number of distinct values in a column.

ColumnCardinalityGood index candidate?
email (unique)10,000,000Yes — highly selective
country195Maybe — depends on query patterns
status (active/inactive)2Rarely — not selective enough
is_deleted (true/false)2No — use partial index instead

Low-cardinality columns return too many rows per value. The optimizer will often choose a sequential scan over an index scan for low-cardinality lookups.

Exception: if a low-cardinality value is rare (e.g., status = 'fraud' matches 0.01% of rows), it is selective and an index helps. A partial index is even better.

Over-Indexing: The Hidden Cost#

Magnifying glass over database index revealing optimized que

Every index you create has costs:

CostImpact
Write amplificationEvery INSERT/UPDATE/DELETE must update all affected indexes
StorageEach index can be 10-30% the size of the table
Memory pressureIndexes compete for buffer pool space
Planning timeMore indexes = more options for the optimizer to evaluate
MaintenanceVACUUM, REINDEX, statistics updates

A table with 10 indexes means every INSERT writes to 11 data structures (table + 10 indexes). For write-heavy workloads, this is devastating.

Rule of thumb: most OLTP tables should have 3-5 indexes. If you have more than 8, audit them.

Check for unused indexes in PostgreSQL:

1
2
3
4
5
6
7
8
9
SELECT
    schemaname,
    tablename,
    indexname,
    idx_scan AS times_used,
    pg_size_pretty(pg_relation_size(indexrelid)) AS index_size
FROM pg_stat_user_indexes
WHERE idx_scan = 0
ORDER BY pg_relation_size(indexrelid) DESC;

Partial Indexes#

A partial index only indexes rows that match a condition:

1
2
3
4
5
6
7
-- Only index non-deleted users (if 99% of users are not deleted)
CREATE INDEX idx_users_active ON users (email)
WHERE is_deleted = FALSE;

-- Only index pending orders (if most orders are completed)
CREATE INDEX idx_orders_pending ON orders (created_at)
WHERE status = 'pending';

Benefits:

  • Much smaller than a full index
  • Faster to maintain (fewer entries to update)
  • Higher hit rate in the buffer pool

The query must include the partial index’s WHERE condition for the optimizer to use it:

1
2
3
4
5
6
7
-- This uses idx_orders_pending
SELECT * FROM orders
WHERE status = 'pending' AND created_at > '2023-12-01';

-- This does NOT use idx_orders_pending
SELECT * FROM orders
WHERE status = 'completed' AND created_at > '2023-12-01';

Expression Indexes#

You can index the result of an expression or function:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
-- Index on lowercase email for case-insensitive lookups
CREATE INDEX idx_users_email_lower ON users (LOWER(email));

-- Query must use the same expression
SELECT * FROM users WHERE LOWER(email) = 'alice@example.com';

-- Index on extracted year from timestamp
CREATE INDEX idx_orders_year ON orders (EXTRACT(YEAR FROM created_at));

-- Index on JSONB field (PostgreSQL)
CREATE INDEX idx_users_metadata_country
ON users ((metadata->>'country'));

Without an expression index, using a function in WHERE prevents the optimizer from using a regular index on that column:

1
2
3
4
5
6
-- This CANNOT use a regular index on email
SELECT * FROM users WHERE LOWER(email) = 'alice@example.com';
-- The database sees LOWER(email), not email — different thing

-- This CAN use a regular index on email
SELECT * FROM users WHERE email = 'alice@example.com';

GIN and GiST Indexes (PostgreSQL)#

Beyond B-tree, PostgreSQL offers specialized index types:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
-- GIN index for full-text search
CREATE INDEX idx_products_search ON products
USING gin (to_tsvector('english', name || ' ' || description));

-- Query using full-text search
SELECT name, ts_rank(to_tsvector('english', name || ' ' || description),
                     plainto_tsquery('english', 'wireless keyboard')) AS rank
FROM products
WHERE to_tsvector('english', name || ' ' || description)
      @@ plainto_tsquery('english', 'wireless keyboard')
ORDER BY rank DESC;

-- GIN index for JSONB containment queries
CREATE INDEX idx_users_metadata ON users USING gin (metadata jsonb_path_ops);

SELECT * FROM users WHERE metadata @> '{"country": "US"}';

-- GiST index for geometric/range data
CREATE INDEX idx_events_timerange ON events USING gist (time_range);
Index TypeBest ForSupported Operations
B-treeEquality, range, sorting=, <, >, BETWEEN, ORDER BY, LIKE 'prefix%'
HashEquality only=
GINArrays, JSONB, full-text@>, &&, @@, ?, ?&
GiSTGeometric, range, nearest-neighbor<<, >>, &&, @>, <->
BRINLarge, naturally ordered tables<, >, = (with reduced precision)

BRIN Indexes: Block Range Indexes#

BRIN indexes are tiny indexes for naturally ordered data (timestamps, auto-incrementing IDs). Instead of indexing every row, they store min/max values per block range (typically 128 pages).

How BRIN Works#

1
2
3
4
Block range 1 (pages 0-127):   min_date = 2024-01-01, max_date = 2024-01-15
Block range 2 (pages 128-255): min_date = 2024-01-15, max_date = 2024-01-31
Block range 3 (pages 256-383): min_date = 2024-02-01, max_date = 2024-02-14
...

When you query WHERE date > '2024-02-01', PostgreSQL skips entire block ranges whose max_date is before the threshold.

1
2
3
4
5
6
7
-- Create BRIN index (orders inserted roughly in time order)
CREATE INDEX idx_orders_created_brin ON orders
USING BRIN (created_at) WITH (pages_per_range = 64);

-- Compare sizes
SELECT pg_size_pretty(pg_relation_size('idx_orders_created_brin')) AS brin_size;
-- Result: 48 kB (vs 2.1 GB for equivalent B-tree on 100M rows)

When BRIN Excels vs Fails#

ConditionBRIN performanceB-tree performance
Data physically sorted by index columnExcellent (skip entire ranges)Good
Data randomly orderedTerrible (every range matches)Good
Table size 100M+ rows~1000x smaller indexWorks but huge index
Point lookups (WHERE id = X)Slow (scan matching ranges)Fast (single path)
Range scans on sorted dataFastFast

Rule: BRIN works when correlation between row position and column value is high. Check with:

1
2
3
SELECT correlation FROM pg_stats
WHERE tablename = 'orders' AND attname = 'created_at';
-- Result: 0.98 → excellent for BRIN. Below 0.5 → don't bother

Bloom Filters in Databases#

Bloom filters answer “is this element possibly in the set?” with zero false negatives and tunable false positive rate. Databases use them to avoid unnecessary disk reads.

Where Bloom Filters Appear#

DatabaseUsage
PostgreSQL (bloom extension)Multi-column equality filters
RocksDB / LevelDBSkip SSTable files that can’t contain the key
CassandraSkip SSTables during reads
HBaseSkip HFiles
Parquet filesSkip row groups

PostgreSQL Bloom Index#

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
CREATE EXTENSION bloom;

-- Bloom index: efficient for multi-column equality queries
-- where you don't know which columns will be queried
CREATE INDEX idx_logs_bloom ON logs USING bloom (
    user_id, session_id, action, status_code
) WITH (length=80, col1=2, col2=2, col3=4, col4=2);

-- This single index accelerates ANY combination:
SELECT * FROM logs WHERE user_id = 42 AND status_code = 500;
SELECT * FROM logs WHERE action = 'login' AND session_id = 'abc';
SELECT * FROM logs WHERE user_id = 42;

A B-tree composite index only helps queries that use a prefix of the index columns. A bloom index helps with any combination, at the cost of higher false positive rate.

Index Maintenance and Bloat#

Indexes degrade over time. Dead tuples from updates/deletes leave holes. Understanding and managing bloat is essential for production databases.

Detecting Index Bloat (PostgreSQL)#

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
-- Check index bloat using pgstattuple extension
CREATE EXTENSION pgstattuple;

SELECT
    indexrelname AS index_name,
    pg_size_pretty(pg_relation_size(indexrelid)) AS index_size,
    idx_scan AS scans,
    idx_tup_read AS tuples_read
FROM pg_stat_user_indexes
WHERE schemaname = 'public'
ORDER BY pg_relation_size(indexrelid) DESC
LIMIT 10;

-- Detailed bloat estimate
SELECT * FROM pgstatindex('idx_orders_user_id');
-- avg_leaf_density < 50% → significant bloat

REINDEX#

1
2
3
4
5
6
7
8
-- Rebuild a bloated index (locks the table briefly)
REINDEX INDEX idx_orders_user_id;

-- CONCURRENTLY: rebuild without locking (PostgreSQL 12+)
REINDEX INDEX CONCURRENTLY idx_orders_user_id;

-- Rebuild ALL indexes on a table
REINDEX TABLE CONCURRENTLY orders;

Unused Index Detection#

1
2
3
4
5
6
7
8
9
-- Find indexes that haven't been scanned since last stats reset
SELECT
    schemaname, tablename, indexname,
    pg_size_pretty(pg_relation_size(indexrelid)) AS size,
    idx_scan AS scans_since_reset
FROM pg_stat_user_indexes
WHERE idx_scan = 0
  AND pg_relation_size(indexrelid) > 1048576  -- > 1MB
ORDER BY pg_relation_size(indexrelid) DESC;

Every unused index consumes disk space and slows down every INSERT/UPDATE/DELETE. Drop them aggressively (after verifying with production traffic).

Automated Index Recommendations#

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
-- PostgreSQL: pg_qualstats tracks predicate usage
CREATE EXTENSION pg_qualstats;

-- After collecting data for a week, find missing indexes:
SELECT
    v.relname AS table,
    v.attnames AS columns,
    v.execution_count,
    v.queryid
FROM pg_qualstats_indexes_recommandations() v
ORDER BY v.execution_count DESC
LIMIT 20;

Practical Index Design Workflow#

When you have a slow query, follow this process:

1
2
3
4
5
6
7
8
1. Run EXPLAIN ANALYZE on the slow query
2. Look for Seq Scans with high "Rows Removed by Filter"
3. Identify which WHERE/JOIN/ORDER BY columns lack indexes
4. Check cardinality of those columns
5. Create the most selective composite index
6. Re-run EXPLAIN ANALYZE to verify improvement
7. Monitor pg_stat_user_indexes for actual usage
8. Drop unused indexes after 30 days

A complete example:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
-- Step 1: The slow query
EXPLAIN ANALYZE
SELECT p.name, SUM(oi.quantity) AS total_sold
FROM order_items oi
JOIN products p ON oi.product_id = p.product_id
JOIN orders o ON oi.order_id = o.order_id
WHERE o.created_at >= '2023-11-01'
  AND o.status = 'completed'
GROUP BY p.name
ORDER BY total_sold DESC
LIMIT 10;
-- Execution Time: 234.567 ms

-- Step 2-4: Seq Scan on orders, filtering 80% of rows
-- Need index on (status, created_at)

-- Step 5: Create index
CREATE INDEX idx_orders_status_date ON orders (status, created_at);

-- Step 6: Verify
EXPLAIN ANALYZE
-- ... same query ...
-- Execution Time: 3.456 ms  (68x faster)

What’s Next#

Indexes tell the database where to find data. But what happens when two transactions try to modify the same data at the same time? In the next article, we will explore transactions and concurrency — ACID guarantees, isolation levels, locking, and the dark art of preventing deadlocks.

In this series

Databases 8 parts

  1. 01 Databases (1): Data Models and SQL — Why Tables Won (For Now)
  2. 02 Databases (2): Indexing and Query Planning — How Databases Find Your Data you are here
  3. 03 Databases (3): Transactions and Concurrency — ACID, Isolation Levels, and Locking
  4. 04 Databases (4): Storage Engines — How Data Hits Disk
  5. 05 Databases (5): NoSQL — Document, Key-Value, Column, and Graph
  6. 06 Databases (6): Replication and Partitioning — Scaling Beyond One Machine
  7. 07 Databases (7): Distributed Transactions — 2PC, Saga, and Why Consensus Is Hard
  8. 08 Databases (8): Databases in Practice — Migration, Monitoring, and War Stories

Liked this piece?

Follow on GitHub for the next one — usually one a week.

GitHub