The Capstone Comparison: Custom Engine vs. Scikit-Learn

The Glass Box vs. The Black BoxAlgorithmic Convergence and Coordinate DescentEvaluating Production Equivalency

🧠The Theory

AI/ML Concept: The Glass Box vs. The Black Box

Production libraries operate as "Black Boxes." They prioritize computational efficiency and abstraction, obfuscating the underlying linear algebra and calculus driving the model's predictions.

Constructing algorithms from scratch produces a "Glass Box." This approach provides the foundational mechanical context required to interpret production behavior. Observing Scikit-Learn's Lasso implementation aggressively zero-out feature coefficients is mathematically demystified only when the engineer has manually implemented the np.sign() subgradient required for $L_1$ penalization. The Glass Box validates the theoretical knowledge necessary to effectively tune, debug, and deploy Black Box architectures.

📐The Math

Math: Algorithmic Convergence

When evaluating custom Batch Gradient Descent algorithms against production libraries like Scikit-Learn, exact decimal parity in weight distribution is rare due to differing underlying optimization strategies. However, the models converge on the same global minimum.

$W_{\text{custom}} \approx W_{\text{sklearn}}$

Batch Gradient Descent (Custom): Iteratively minimizes the error function by updating all weights simultaneously using partial derivatives scaled by the learning rate ( $\alpha$ ).
Analytical Solvers (Sklearn Ridge/Linear): Utilizes linear algebra techniques such as Cholesky Decomposition or Singular Value Decomposition (SVD) to calculate the exact global minimum algebraically in a single step, bypassing epochs.
Coordinate Descent (Sklearn Lasso): Iteratively optimizes a single weight while holding all others constant. This mathematically resolves the non-differentiability of the $L_1$ penalty's absolute value at exactly $0$ .

Despite divergent mathematical paths, sound implementations converge on highly comparable $R^2$ scores and directional weight distributions.

💡Insights and Mistakes

Developer's Insight: The Sklearn Convergence

Executing the engineered matrix through Scikit-Learn's production library provided mathematical validation of the custom Batch Gradient Descent engine.

1. Convergence Parity
The custom iterative engine achieved an $R^2$ of 0.9287. Scikit-Learn's highly optimized LinearRegression achieved 0.9330. Despite Sklearn utilizing advanced analytical solvers (e.g., SVD) rather than iterative gradient steps, the custom engine successfully navigated the multi-dimensional loss curve to converge within a fraction of a percent of the production standard.

2. Observing the Lasso Snowplow
The evaluation log for Lasso with an Alpha of 100.0 provided a direct visualization of $L_1$ feature selection. The coefficient array evaluated to: [-474.25, 2.58, 0., -29.26, 250.49, 861.13, -0., 0.]. Coordinate descent mathematically crushed the 3rd, 7th, and 8th variables to an absolute 0.0, actively deleting them from the prediction equation. Programming the underlying sign() derivative of the $L_1$ penalty in previous iterations made this dynamic feature deletion entirely predictable.

⚙️The Code

from sklearn.linear_model import LinearRegression, Lasso, Ridge, ElasticNet
from sklearn.metrics import r2_score
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
import pandas as pd

df_raw = pd.read_csv('raw-dataset.csv')
df_engineered = pd.read_csv('engineered-dataset.csv')
datasets = {
    'raw': df_raw,
    'engineered': df_engineered
}

alpha_values = [0.1, 1.0, 10.0, 100.0]

for dataset_name, dataset in datasets.items():
    print(f"Evaluating models on dataset: {dataset_name}")

    X = dataset.drop('EnergyConsumption', axis=1)
    y = dataset['EnergyConsumption']
    X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

    scaler = StandardScaler()
    X_train = scaler.fit_transform(X_train)
    X_test = scaler.transform(X_test)

    for alpha in alpha_values:
        linear = LinearRegression()
        lasso = Lasso(alpha=alpha, max_iter=10000)
        ridge = Ridge(alpha=alpha, max_iter=10000)
        elastic_net = ElasticNet(alpha=alpha, l1_ratio=0.5, max_iter=10000)

        linear.fit(X_train, y_train)
        lasso.fit(X_train, y_train)
        ridge.fit(X_train, y_train)
        elastic_net.fit(X_train, y_train)

        linear_pred = linear.predict(X_test)
        lasso_pred = lasso.predict(X_test)
        ridge_pred = ridge.predict(X_test)
        elastic_net_pred = elastic_net.predict(X_test)

        print(f"Alpha: {alpha}")

        print(f"Linear Coefficients: {linear.coef_}")
        print(f"Linear Intercept: {linear.intercept_}")
        print(f"Linear R²: {r2_score(y_test, linear_pred):.6f}")

        print(f"Lasso Coefficients: {lasso.coef_}")
        print(f"Lasso Intercept: {lasso.intercept_}")
        print(f"Lasso R²: {r2_score(y_test, lasso_pred):.6f}")

        print(f"Ridge Coefficients: {ridge.coef_}")
        print(f"Ridge Intercept: {ridge.intercept_}")
        print(f"Ridge R²: {r2_score(y_test, ridge_pred):.6f}")

        print(f"Elastic Net Coefficients: {elastic_net.coef_}")
        print(f"Elastic Net Intercept: {elastic_net.intercept_}")
        print(f"Elastic Net R²: {r2_score(y_test, elastic_net_pred):.6f}")

Code Breakdown

This script establishes a baseline comparison between raw and engineered datasets using Scikit-Learn's production estimators. It evaluates Linear, Ridge, Lasso, and ElasticNet regressions across varying penalty strengths (alpha) to observe algorithmic convergence and feature selection.

Month 1 Retrospective: The Glass Box Engine Upgrading the Engine: The Regularization Suite