Machine Learning

May 8, 2026 Alibaba Cloud Full Stack 52 min read

Alibaba Cloud Full Stack (11): PAI — The ML Platform

The complete ML platform on Alibaba Cloud: PAI-DSW for notebooks, PAI-DLC for distributed training, PAI-EAS for model serving, Designer for visual workflows, and Model Gallery. Train and deploy a custom model end-to-end.

Mar 6, 2026 Aliyun PAI 26 min read

Aliyun PAI (2): PAI-DSW — Notebooks That Don't Eat Your Weights

Working with PAI-DSW for real: choosing the right GPU image, mounting OSS so you don't lose checkpoints when the instance restarts, and an MNIST notebook drawn from the official Quick Start that you can copy-paste.

Mar 5, 2026 Aliyun PAI 10 min read

Aliyun PAI (1): Platform Overview and the Product Family Map

What Aliyun PAI actually is in 2026, the four-layer architecture from the official docs, the five sub-products you'll touch, and a sane account/workspace setup so the rest of the series can skip the boilerplate.

Feb 8, 2026 ML Math Derivations 28 min read

ML Math Derivations (20): Regularization and Model Selection

The series finale: from the bias-variance decomposition to L1/L2 geometry, dropout as a sub-network sampler, k-fold CV, AIC/BIC, VC bounds, and the modern double-descent phenomenon that broke classical theory.

Feb 7, 2026 ML Math Derivations 32 min read

ML Math Derivations (19): Neural Networks and Backpropagation

How does a neural network learn? This article derives forward propagation, the chain rule mechanics of backpropagation, vanishing/exploding gradients, and initialization strategies (Xavier, He).

Feb 6, 2026 ML Math Derivations 32 min read

ML Math Derivations (18): Clustering Algorithms

How do you find groups in unlabeled data? This article derives K-means (Lloyd + K-means++), hierarchical, DBSCAN, spectral, and GMM clustering from their mathematical foundations, with seven figures that show why each …

Feb 5, 2026 ML Math Derivations 26 min read

ML Math Derivations (17): Dimensionality Reduction and PCA

High-dimensional spaces are hostile to distance-based learning. This article derives PCA from two equivalent angles (max variance and min reconstruction error), and extends to kernel PCA, LDA, t-SNE, and ICA -- with …

Feb 4, 2026 ML Math Derivations 26 min read

ML Math Derivations (16): Conditional Random Fields

Why do CRFs outperform HMMs on sequence labeling? This article derives linear-chain CRF from the ground up -- potential functions, the forward-backward algorithm, gradient computation, and Viterbi decoding.

Feb 3, 2026 ML Math Derivations 24 min read

ML Math Derivations (15): Hidden Markov Models

Derive the three classical HMM algorithms from one principle (factorising the joint, then sharing sub-computations across time): Forward-Backward for evaluation and smoothing, Viterbi for MAP decoding, and Baum-Welch …

Feb 2, 2026 ML Math Derivations 28 min read

ML Math Derivations (14): Variational Inference and Variational EM

A first-principles derivation of variational inference. From the ELBO identity and the mean-field assumption to the CAVI updates, variational EM, and the reparameterization trick that powers VAEs.

Feb 1, 2026 ML Math Derivations 24 min read

ML Math Derivations (13): EM Algorithm and GMM

Derive the EM algorithm from Jensen's inequality and the ELBO, prove its monotone-ascent guarantee, and apply it to Gaussian Mixture Models with full E-step / M-step formulas, model selection via BIC/AIC, and the K-means …

Jan 31, 2026 ML Math Derivations 28 min read

ML Math Derivations (12): XGBoost and LightGBM

Derive XGBoost's second-order Taylor expansion, regularised objective and split-gain formula, then explore LightGBM's histogram algorithm, GOSS sampling and EFB bundling for industrial-scale gradient boosting.

Jan 30, 2026 ML Math Derivations 36 min read

ML Math Derivations (11): Ensemble Learning

Derive why combining weak learners produces strong ones. Covers bias-variance decomposition, Bagging/Random Forest variance reduction, AdaBoost exponential loss, and GBDT gradient optimization in function space.

Jan 29, 2026 ML Math Derivations 28 min read

ML Math Derivations (10): Semi-Naive Bayes and Bayesian Networks

From SPODE, TAN and AODE to full Bayesian networks: how relaxing the conditional-independence assumption -- through one-dependence trees, ensembles of super-parents and graphical structure learning -- closes the gap …

Jan 28, 2026 ML Math Derivations 34 min read

ML Math Derivations (9): Naive Bayes

Rigorous derivation of Naive Bayes from Bayes theorem through conditional independence, parameter estimation, Laplace smoothing, three model variants, and why it works despite violated assumptions.

Jan 27, 2026 ML Math Derivations 28 min read

ML Math Derivations (8): Support Vector Machines

Complete SVM derivation from maximum margin to Lagrangian duality, KKT conditions, soft margin, kernel trick, and SMO algorithm with step-by-step proofs and Python code.

Jan 26, 2026 ML Math Derivations 38 min read

ML Math Derivations (7): Decision Trees

From information entropy to the Gini index, from ID3 to CART — a complete derivation of decision-tree mathematics: split criteria, continuous and missing values, pruning, and feature importance, with sklearn-verified …

Jan 25, 2026 ML Math Derivations 32 min read

ML Math Derivations (6): Logistic Regression and Classification

Complete derivation of logistic regression from sigmoid to softmax, cross-entropy loss, gradient computation, regularization, and multi-class extension with Python verification.

Jan 24, 2026 ML Math Derivations 32 min read

ML Math Derivations (5): Linear Regression

A complete derivation of linear regression from three perspectives -- algebra (the normal equation), geometry (orthogonal projection), and probability (maximum likelihood) -- followed by Ridge, Lasso, gradient methods, …

Jan 23, 2026 ML Math Derivations 44 min read

ML Math Derivations (4): Convex Optimization Theory

Nearly every ML algorithm is an optimization problem. This article derives convex sets, convex functions, gradient descent, Newton's method, KKT conditions, and ADMM -- the optimization toolkit for machine learning.

Jan 22, 2026 ML Math Derivations 28 min read

ML Math Derivations (3): Probability Theory and Statistical Inference

Machine learning is uncertainty modeling. This article derives probability spaces, common distributions, MLE, Bayesian estimation, limit theorems and information theory -- the statistical engine behind every ML model.

Jan 21, 2026 ML Math Derivations 30 min read

ML Math Derivations (2): Linear Algebra and Matrix Theory

The language of machine learning is linear algebra. This article derives vector spaces, eigendecomposition, SVD, and matrix calculus from first principles -- every tool you need for ML optimization.

Jan 20, 2026 ML Math Derivations 42 min read

ML Math Derivations (1): Introduction and Mathematical Foundations

Why can machines learn from data at all? This first chapter builds the mathematical theory of learning from first principles -- problem formalization, loss surrogates, PAC learning, VC dimension, the bias-variance …

Jul 28, 2025 Standalone 26 min read

Symplectic Geometry and Structure-Preserving Neural Networks

Learn physics-informed neural networks that preserve energy and symplectic structure. Covers HNN, LNN, SympNet, symplectic integrators, and four classical experiments.

May 1, 2025 Transfer Learning 46 min read

Transfer Learning (1): Fundamentals and Core Concepts

A beginner-friendly guide to transfer learning fundamentals: why it works, formal definitions, taxonomy, negative transfer, and a complete feature-transfer implementation with MMD domain adaptation.

Apr 9, 2025 Linear Algebra 38 min read

Essence of Linear Algebra (15): Linear Algebra in Machine Learning

Machine learning speaks linear algebra as its native language. From PCA to SVMs, from matrix factorization in recommender systems to gradient descent optimization -- see how vectors, matrices, and decompositions power …

Aug 30, 2024 Probability and Statistics 28 min read

Probability and Statistics (8): Bayesian Statistics — Priors, Posteriors, and Why Frequentists Argue

Bayesian inference from first principles: posterior distributions, conjugate priors, the Beta-Binomial and Normal-Normal models, credible intervals, predictive distributions, MCMC intuition, and deep connections to …

Aug 14, 2024 PDE and Machine Learning 44 min read

PDE and ML (8): Reaction-Diffusion Systems and Graph Neural Networks

Deep GNNs collapse because they are diffusion equations on graphs. Turing's reaction-diffusion theory tells us how to fix it -- and closes the eight-chapter PDE+ML series.

Jul 30, 2024 PDE and Machine Learning 38 min read

PDE and ML (7): Diffusion Models and Score Matching

Diffusion models are PDE solvers in disguise. We derive the heat equation, Fokker-Planck, score matching, DDPM, and DDIM from a unified PDE perspective and visualise every step.

Jul 15, 2024 PDE and Machine Learning 38 min read

PDE and ML (6): Continuous Normalizing Flows and Neural ODE

How do you turn a Gaussian into a complex data distribution? This article derives Neural ODEs, the adjoint method, continuous normalizing flows (FFJORD), and Flow Matching from the underlying ODE/PDE theory, and shows …

Jun 30, 2024 PDE and Machine Learning 48 min read

PDE and ML (5): Symplectic Geometry and Structure-Preserving Networks

Standard neural networks violate conservation laws. This article derives Hamiltonian mechanics, symplectic integrators, HNNs, LNNs, and SympNets from the geometry of phase space.

May 31, 2024 PDE and Machine Learning 54 min read

PDE and ML (3): Variational Principles and Optimization

Calculus of variations to Wasserstein gradient flow and the mean-field limit of neural networks.