AI Logbook
Live Learning Feed

AI Logbook

Understanding intelligent systems from first principles.

Month 1 Retrospective: The Glass Box Engine

System Architecture of a RegressorLinear Algebra & Calculus SynthesisFrom Scratch Implementation Review

šŸ—ļøArchitecture

System Architecture: The Glass Box Engine

Constructing a production-ready machine learning engine entirely from scratch establishes a transparent mathematical pipeline, bypassing black-box abstractions.

The system executes in four distinct architectural phases:

  1. The Representation Layer (Linear Algebra):
    • Data is ingested and cast into a mathematical XX matrix.
    • To prevent data leakage, the Z-score scaler computes its state strictly on the training matrix before transforming the data.
    • Non-linear complexities (cyclical time, parabolas, feature interactions) are engineered directly into the matrix columns prior to algorithm ingestion.
  2. The Forward Pass (The Hypothesis):
    • The model calculates its prediction using the dot product: y^āƒ—=Xwāƒ—+b\vec{\hat{y}} = X\vec{w} + b.
  3. The Loss & Penalty Calculation (The Objective):
    • The system calculates the Mean Squared Error (MSE).
    • Regularization mathematically penalizes large weights using L1L_1 (Lasso), L2L_2 (Ridge), or ElasticNet, explicitly dividing the penalty by the sample size NN to maintain scale stability across varying dataset volumes.
  4. The Backward Pass (Calculus & Optimization):
    • The engine calculates the partial derivatives (gradients) of the loss function with respect to every single weight.
    • Batch Gradient Descent subtracts these gradients (scaled by the learning rate α\alpha) from the current weights, iteratively descending the multidimensional error surface to locate the global minimum.

šŸ“The Math

Math: The Master Equation

The foundational linear regression architecture culminates in a single, regularized batch gradient update equation.

The Weight Update (e.g., Ridge):
wnew=woldāˆ’Ī±(2NXT(y^āƒ—āˆ’yāƒ—)+2Ī»Nwāƒ—)w_{\text{new}} = w_{\text{old}} - \alpha \left( \frac{2}{N} X^T (\vec{\hat{y}} - \vec{y}) + \frac{2\lambda}{N} \vec{w} \right)

  • α\alpha: The step size (Learning Rate).
  • 2NXT(y^āƒ—āˆ’yāƒ—)\frac{2}{N} X^T (\vec{\hat{y}} - \vec{y}): The base gradient derived from the Mean Squared Error.
  • 2Ī»Nwāƒ—\frac{2\lambda}{N} \vec{w}: The L2L_2 penalty gradient, scaling the force of the regularization constraints.