Databases on Chen Kai Blog

Databases (8): Databases in Practice — Migration, Monitoring, and War Stories

Tue, 30 Apr 2024 09:00:00 +0000

Knowing how databases work internally is half the battle. The other half is keeping them running in production without losing data, dropping availability, or waking up at 3 AM. This article covers the operational knowledge that comes from experience — the things nobody teaches you until something breaks.

Schema Migrations: Changing the Engine While Flying#

Your schema will change. New features require new columns, new tables, new indexes. The question is how to evolve the schema without downtime.

Databases (7): Distributed Transactions — 2PC, Saga, and Why Consensus Is Hard

Sun, 28 Apr 2024 09:00:00 +0000

Everything we covered about transactions in Article 3 assumed a single database server: one machine, one transaction log, one lock manager. When your data spans multiple machines—through sharding, using microservices with separate databases, or replicating with strong consistency—you face the hardest problem in distributed systems: how do you get multiple machines to agree?

The Distributed Transaction Problem#

Consider an e-commerce system with separate services for orders and inventory, each with its own database:

Databases (6): Replication and Partitioning — Scaling Beyond One Machine

Fri, 26 Apr 2024 09:00:00 +0000

A single database server can handle a remarkable amount of load — a well-tuned PostgreSQL instance can serve tens of thousands of queries per second. But eventually you hit a wall. Maybe you need more read throughput than one CPU can provide. Maybe you need your data to survive a data center fire. Maybe your dataset exceeds what fits on a single disk. That is when you need replication and partitioning.

Databases (5): NoSQL — Document, Key-Value, Column, and Graph

Wed, 24 Apr 2024 09:00:00 +0000

Not everything fits neatly into rows and columns. A social network’s friend graph, a product catalog with wildly varying attributes, a real-time leaderboard, a recommendation engine’s relationship web — these workloads push relational databases into awkward territory. NoSQL databases exist because different data models solve different problems better. The trick is knowing which one to reach for.

Why NoSQL?#

The term “NoSQL” is misleading. It does not mean “no SQL” — some NoSQL databases support SQL-like query languages. It means “not only SQL” or, more accurately, “non-relational.” The motivations for NoSQL fall into three categories:

Databases (4): Storage Engines — How Data Hits Disk

Mon, 22 Apr 2024 09:00:00 +0000

Every SQL statement you write eventually becomes bytes written to a disk. The component responsible for this translation — the storage engine — determines your database’s performance characteristics more than almost any other factor. Two tables with identical schemas and identical data can perform wildly differently depending on the storage engine underneath. Understanding this layer explains why databases behave the way they do.

The Basics: Pages, Extents, and Tablespaces#

Databases do not read or write individual rows from disk. Disk I/O operates on pages (also called blocks), typically 4 KB, 8 KB, or 16 KB.

Databases (3): Transactions and Concurrency — ACID, Isolation Levels, and Locking

Sun, 21 Apr 2024 09:00:00 +0000

Every application that handles money, inventory, or any state that matters eventually hits a concurrency bug. Two users buy the last item in stock. A bank transfer debits one account but crashes before crediting the other. A report reads half-updated data and produces nonsense numbers. Transactions exist to prevent these failures, and understanding how they work is non-negotiable for anyone building production systems.

What Is a Transaction?#

A transaction is a group of operations that the database treats as a single unit. Either all operations succeed, or none of them do.

Databases (2): Indexing and Query Planning — How Databases Find Your Data

Fri, 19 Apr 2024 09:00:00 +0000

A query that returns in 2 milliseconds on your laptop with 1,000 rows will take 45 seconds on a production database with 50 million rows — unless you have the right indexes. Indexes are the single most impactful performance tool in your database toolkit, and understanding how they work changes the way you think about every schema and every query you write.

The Fundamental Problem: Finding a Row#

Imagine a table with 10 million rows, stored on disk as a heap file. Each row sits somewhere in a sequence of 8 KB pages. When you run:

Databases (1): Data Models and SQL — Why Tables Won (For Now)

Wed, 17 Apr 2024 09:00:00 +0000

Every application you have ever used sits on top of a data model. Pick the wrong one and you spend the next three years fighting your own database instead of shipping features.

For the past four decades, one model has dominated: the relational model. Flat tables, foreign keys, SQL. It is not glamorous. It is not trendy. But there is a reason almost every bank, airline, hospital, and e-commerce platform still runs on it — and understanding why is the first step to understanding databases at all.