<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Observability on Chen Kai Blog</title><link>https://www.chenk.top/en/tags/observability/</link><description>Recent content in Observability on Chen Kai Blog</description><generator>Hugo</generator><language>en</language><lastBuildDate>Mon, 04 May 2026 09:00:00 +0000</lastBuildDate><atom:link href="https://www.chenk.top/en/tags/observability/index.xml" rel="self" type="application/rss+xml"/><item><title>Alibaba Cloud Full Stack (7): SLS, CloudMonitor, and Observability</title><link>https://www.chenk.top/en/aliyun-fullstack/07-observability/</link><pubDate>Mon, 04 May 2026 09:00:00 +0000</pubDate><guid>https://www.chenk.top/en/aliyun-fullstack/07-observability/</guid><description>&lt;p>The worst production outage I ever caused took three hours to diagnose. A Node.js service was returning 502s intermittently — maybe 5% of requests — and I had nothing. No centralized logs (each ECS instance had its own &lt;code>/var/log/&lt;/code> and I was SSH-ing into them one at a time). No metrics dashboards (I was running &lt;code>top&lt;/code> and &lt;code>df -h&lt;/code> in terminals). No tracing (I was adding &lt;code>console.log&lt;/code> timestamps to try to figure out which downstream call was hanging). Three hours later, I found the issue: a connection pool to RDS was exhausting under load because a forgotten cron job was holding connections open. The fix was two lines of code. The diagnosis took three hours of misery because I had zero observability.&lt;/p></description></item><item><title>Docker and Containers (6): Debugging and Logging — When Things Go Wrong Inside a Box</title><link>https://www.chenk.top/en/docker-containers/06-debugging-and-logging/</link><pubDate>Wed, 21 Jun 2023 09:00:00 +0000</pubDate><guid>https://www.chenk.top/en/docker-containers/06-debugging-and-logging/</guid><description>&lt;p>A container that works is invisible. A container that doesn&amp;rsquo;t work is a black box. The entire point of containerization is isolation — but that same isolation makes debugging harder. You can&amp;rsquo;t just &lt;code>ssh&lt;/code> into a container or browse its filesystem from the host. Docker provides a specific set of tools for inspecting, diagnosing, and understanding what happens inside running (and crashed) containers.&lt;/p>
&lt;hr>
&lt;h2 id="reading-container-logs" class="heading-anchor">Reading Container Logs&lt;a href="#reading-container-logs" class="heading-link" aria-label="Permalink to this section" title="Copy link to this section">#&lt;/a>
&lt;/h2>&lt;p>Logs are your first line of investigation. Docker captures everything a container writes to stdout and stderr.&lt;/p></description></item></channel></rss>