Essence of Linear Algebra (17): Linear Algebra in Computer Vision

Wed, 23 Apr 2025 09:00:00 +0000

Computer vision is the science of teaching machines to see. What is striking is how thoroughly the whole field reduces to linear algebra: an image is a matrix, a geometric transformation is a matrix product, a camera is a $3 \times 4$ projection matrix, two-view geometry is the equation $\mathbf{x}_2^\top \mathbf{F}\, \mathbf{x}_1 = 0$ , and 3D reconstruction is a sparse linear least-squares problem. Once you see the field through that lens, what once looked like a zoo of algorithms turns out to be a small set of linear-algebraic ideas applied repeatedly.

Tennis-Scene Computer Vision: From Paper Survey to Production

Sat, 31 Aug 2024 09:00:00 +0000

A 6.7 cm tennis ball travels at over 200 km/h. Reconstructing its 3D trajectory from eight 4K cameras in real time, while also classifying each player’s stroke, involves small-object detection, multi-view geometry, Kalman filtering, physics modeling, and human-pose estimation — all at once. This post follows the same steps as in deployment: state the constraints, survey the literature, choose, build, and lay out a millisecond-by-millisecond budget for production.

What You Will Learn#

Why traditional detectors collapse on 10–20 px tennis balls and how the TrackNet line fixes it
Multi-camera calibration, PTP synchronisation, and DLT triangulation in code and math
A 9-state Kalman filter coupled with a drag-plus-Magnus ODE for trajectory prediction
Action recognition: rule-based templates vs. end-to-end learning, and when each wins
How to fit detection → 3D → tracking → pose → analytics into a 16.7 ms / frame budget

Prerequisites: pinhole camera model, basic Kalman filtering, and some PyTorch inference experience.

3D Reconstruction on Chen Kai Blog

Essence of Linear Algebra (17): Linear Algebra in Computer Vision

Tennis-Scene Computer Vision: From Paper Survey to Production

What You Will Learn#