<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Databases on Chen Kai Blog</title><link>https://www.chenk.top/en/tags/databases/</link><description>Recent content in Databases on Chen Kai Blog</description><generator>Hugo</generator><language>en</language><lastBuildDate>Tue, 30 Apr 2024 09:00:00 +0000</lastBuildDate><atom:link href="https://www.chenk.top/en/tags/databases/index.xml" rel="self" type="application/rss+xml"/><item><title>Databases (8): Databases in Practice — Migration, Monitoring, and War Stories</title><link>https://www.chenk.top/en/databases/08-database-in-practice/</link><pubDate>Tue, 30 Apr 2024 09:00:00 +0000</pubDate><guid>https://www.chenk.top/en/databases/08-database-in-practice/</guid><description>&lt;p>Knowing how databases work internally is half the battle. The other half is keeping them running in production without losing data, dropping availability, or waking up at 3 AM. This article covers the operational knowledge that comes from experience — the things nobody teaches you until something breaks.&lt;/p>
&lt;hr>
&lt;h2 id="schema-migrations-changing-the-engine-while-flying" class="heading-anchor">Schema Migrations: Changing the Engine While Flying&lt;a href="#schema-migrations-changing-the-engine-while-flying" class="heading-link" aria-label="Permalink to this section" title="Copy link to this section">#&lt;/a>
&lt;/h2>&lt;p>Your schema will change. New features require new columns, new tables, new indexes. The question is how to evolve the schema without downtime.&lt;/p></description></item><item><title>Databases (7): Distributed Transactions — 2PC, Saga, and Why Consensus Is Hard</title><link>https://www.chenk.top/en/databases/07-distributed-transactions/</link><pubDate>Sun, 28 Apr 2024 09:00:00 +0000</pubDate><guid>https://www.chenk.top/en/databases/07-distributed-transactions/</guid><description>&lt;p>Everything we covered about transactions in Article 3 assumed a single database server: one machine, one transaction log, one lock manager. When your data spans multiple machines—through sharding, using microservices with separate databases, or replicating with strong consistency—you face the hardest problem in distributed systems: how do you get multiple machines to agree?&lt;/p>
&lt;hr>
&lt;h2 id="the-distributed-transaction-problem" class="heading-anchor">The Distributed Transaction Problem&lt;a href="#the-distributed-transaction-problem" class="heading-link" aria-label="Permalink to this section" title="Copy link to this section">#&lt;/a>
&lt;/h2>&lt;p>Consider an e-commerce system with separate services for orders and inventory, each with its own database:&lt;/p></description></item><item><title>Databases (6): Replication and Partitioning — Scaling Beyond One Machine</title><link>https://www.chenk.top/en/databases/06-replication-and-partitioning/</link><pubDate>Fri, 26 Apr 2024 09:00:00 +0000</pubDate><guid>https://www.chenk.top/en/databases/06-replication-and-partitioning/</guid><description>&lt;p>A single database server can handle a remarkable amount of load — a well-tuned PostgreSQL instance can serve tens of thousands of queries per second. But eventually you hit a wall. Maybe you need more read throughput than one CPU can provide. Maybe you need your data to survive a data center fire. Maybe your dataset exceeds what fits on a single disk. That is when you need replication and partitioning.&lt;/p></description></item><item><title>Databases (5): NoSQL — Document, Key-Value, Column, and Graph</title><link>https://www.chenk.top/en/databases/05-nosql-landscape/</link><pubDate>Wed, 24 Apr 2024 09:00:00 +0000</pubDate><guid>https://www.chenk.top/en/databases/05-nosql-landscape/</guid><description>&lt;p>Not everything fits neatly into rows and columns. A social network&amp;rsquo;s friend graph, a product catalog with wildly varying attributes, a real-time leaderboard, a recommendation engine&amp;rsquo;s relationship web — these workloads push relational databases into awkward territory. NoSQL databases exist because different data models solve different problems better. The trick is knowing which one to reach for.&lt;/p>
&lt;hr>
&lt;h2 id="why-nosql" class="heading-anchor">Why NoSQL?&lt;a href="#why-nosql" class="heading-link" aria-label="Permalink to this section" title="Copy link to this section">#&lt;/a>
&lt;/h2>&lt;p>The term &amp;ldquo;NoSQL&amp;rdquo; is misleading. It does not mean &amp;ldquo;no SQL&amp;rdquo; — some NoSQL databases support SQL-like query languages. It means &amp;ldquo;not only SQL&amp;rdquo; or, more accurately, &amp;ldquo;non-relational.&amp;rdquo; The motivations for NoSQL fall into three categories:&lt;/p></description></item><item><title>Databases (4): Storage Engines — How Data Hits Disk</title><link>https://www.chenk.top/en/databases/04-storage-engines/</link><pubDate>Mon, 22 Apr 2024 09:00:00 +0000</pubDate><guid>https://www.chenk.top/en/databases/04-storage-engines/</guid><description>&lt;p>Every SQL statement you write eventually becomes bytes written to a disk. The component responsible for this translation — the storage engine — determines your database&amp;rsquo;s performance characteristics more than almost any other factor. Two tables with identical schemas and identical data can perform wildly differently depending on the storage engine underneath. Understanding this layer explains &lt;em>why&lt;/em> databases behave the way they do.&lt;/p>
&lt;hr>
&lt;h2 id="the-basics-pages-extents-and-tablespaces" class="heading-anchor">The Basics: Pages, Extents, and Tablespaces&lt;a href="#the-basics-pages-extents-and-tablespaces" class="heading-link" aria-label="Permalink to this section" title="Copy link to this section">#&lt;/a>
&lt;/h2>&lt;p>Databases do not read or write individual rows from disk. Disk I/O operates on &lt;strong>pages&lt;/strong> (also called blocks), typically 4 KB, 8 KB, or 16 KB.&lt;/p></description></item><item><title>Databases (3): Transactions and Concurrency — ACID, Isolation Levels, and Locking</title><link>https://www.chenk.top/en/databases/03-transactions-and-concurrency/</link><pubDate>Sun, 21 Apr 2024 09:00:00 +0000</pubDate><guid>https://www.chenk.top/en/databases/03-transactions-and-concurrency/</guid><description>&lt;p>Every application that handles money, inventory, or any state that matters eventually hits a concurrency bug. Two users buy the last item in stock. A bank transfer debits one account but crashes before crediting the other. A report reads half-updated data and produces nonsense numbers. Transactions exist to prevent these failures, and understanding how they work is non-negotiable for anyone building production systems.&lt;/p>
&lt;hr>
&lt;h2 id="what-is-a-transaction" class="heading-anchor">What Is a Transaction?&lt;a href="#what-is-a-transaction" class="heading-link" aria-label="Permalink to this section" title="Copy link to this section">#&lt;/a>
&lt;/h2>&lt;p>A transaction is a group of operations that the database treats as a single unit. Either &lt;strong>all&lt;/strong> operations succeed, or &lt;strong>none&lt;/strong> of them do.&lt;/p></description></item><item><title>Databases (2): Indexing and Query Planning — How Databases Find Your Data</title><link>https://www.chenk.top/en/databases/02-indexing-and-query-planning/</link><pubDate>Fri, 19 Apr 2024 09:00:00 +0000</pubDate><guid>https://www.chenk.top/en/databases/02-indexing-and-query-planning/</guid><description>&lt;p>A query that returns in 2 milliseconds on your laptop with 1,000 rows will take 45 seconds on a production database with 50 million rows — unless you have the right indexes. Indexes are the single most impactful performance tool in your database toolkit, and understanding how they work changes the way you think about every schema and every query you write.&lt;/p>
&lt;hr>
&lt;h2 id="the-fundamental-problem-finding-a-row" class="heading-anchor">The Fundamental Problem: Finding a Row&lt;a href="#the-fundamental-problem-finding-a-row" class="heading-link" aria-label="Permalink to this section" title="Copy link to this section">#&lt;/a>
&lt;/h2>&lt;p>Imagine a table with 10 million rows, stored on disk as a heap file. Each row sits somewhere in a sequence of 8 KB pages. When you run:&lt;/p></description></item><item><title>Databases (1): Data Models and SQL — Why Tables Won (For Now)</title><link>https://www.chenk.top/en/databases/01-data-models-and-sql/</link><pubDate>Wed, 17 Apr 2024 09:00:00 +0000</pubDate><guid>https://www.chenk.top/en/databases/01-data-models-and-sql/</guid><description>&lt;p>Every application you have ever used sits on top of a data model. Pick the wrong one and you spend the next three years fighting your own database instead of shipping features.&lt;/p>
&lt;p>For the past four decades, one model has dominated: the relational model. Flat tables, foreign keys, SQL. It is not glamorous. It is not trendy. But there is a reason almost every bank, airline, hospital, and e-commerce platform still runs on it — and understanding &lt;em>why&lt;/em> is the first step to understanding databases at all.&lt;/p></description></item></channel></rss>