The Capstone Comparison: Custom Engine vs. Scikit-Learn
🧠The Theory
AI/ML Concept: The Glass Box vs. The Black Box
Production libraries operate as "Black Boxes." They prioritize computational efficiency and abstraction, obfuscating the underlying linear algebra and calculus driving the model's predictions.
Constructing algorithms from scratch produces a "Glass Box." This approach provides the foundational mechanical context required to interpret production behavior. Observing Scikit-Learn's Lasso implementation aggressively zero-out feature coefficients is mathematically demystified only when the engineer has manually implemented the np.sign() subgradient required for penalization. The Glass Box validates the theoretical knowledge necessary to effectively tune, debug, and deploy Black Box architectures.
📐The Math
Math: Algorithmic Convergence
When evaluating custom Batch Gradient Descent algorithms against production libraries like Scikit-Learn, exact decimal parity in weight distribution is rare due to differing underlying optimization strategies. However, the models converge on the same global minimum.
- Batch Gradient Descent (Custom): Iteratively minimizes the error function by updating all weights simultaneously using partial derivatives scaled by the learning rate ().
- Analytical Solvers (Sklearn Ridge/Linear): Utilizes linear algebra techniques such as Cholesky Decomposition or Singular Value Decomposition (SVD) to calculate the exact global minimum algebraically in a single step, bypassing epochs.
- Coordinate Descent (Sklearn Lasso): Iteratively optimizes a single weight while holding all others constant. This mathematically resolves the non-differentiability of the penalty's absolute value at exactly .
Despite divergent mathematical paths, sound implementations converge on highly comparable scores and directional weight distributions.
💡Insights and Mistakes
Developer's Insight: The Sklearn Convergence
Executing the engineered matrix through Scikit-Learn's production library provided mathematical validation of the custom Batch Gradient Descent engine.
1. Convergence Parity
The custom iterative engine achieved an of 0.9287. Scikit-Learn's highly optimized LinearRegression achieved 0.9330. Despite Sklearn utilizing advanced analytical solvers (e.g., SVD) rather than iterative gradient steps, the custom engine successfully navigated the multi-dimensional loss curve to converge within a fraction of a percent of the production standard.
2. Observing the Lasso Snowplow
The evaluation log for Lasso with an Alpha of 100.0 provided a direct visualization of feature selection. The coefficient array evaluated to: [-474.25, 2.58, 0., -29.26, 250.49, 861.13, -0., 0.]. Coordinate descent mathematically crushed the 3rd, 7th, and 8th variables to an absolute 0.0, actively deleting them from the prediction equation. Programming the underlying sign() derivative of the penalty in previous iterations made this dynamic feature deletion entirely predictable.
⚙️The Code
from sklearn.linear_model import LinearRegression, Lasso, Ridge, ElasticNet
from sklearn.metrics import r2_score
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
import pandas as pd
df_raw = pd.read_csv('raw-dataset.csv')
df_engineered = pd.read_csv('engineered-dataset.csv')
datasets = {
'raw': df_raw,
'engineered': df_engineered
}
alpha_values = [0.1, 1.0, 10.0, 100.0]
for dataset_name, dataset in datasets.items():
print(f"Evaluating models on dataset: {dataset_name}")
X = dataset.drop('EnergyConsumption', axis=1)
y = dataset['EnergyConsumption']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)
for alpha in alpha_values:
linear = LinearRegression()
lasso = Lasso(alpha=alpha, max_iter=10000)
ridge = Ridge(alpha=alpha, max_iter=10000)
elastic_net = ElasticNet(alpha=alpha, l1_ratio=0.5, max_iter=10000)
linear.fit(X_train, y_train)
lasso.fit(X_train, y_train)
ridge.fit(X_train, y_train)
elastic_net.fit(X_train, y_train)
linear_pred = linear.predict(X_test)
lasso_pred = lasso.predict(X_test)
ridge_pred = ridge.predict(X_test)
elastic_net_pred = elastic_net.predict(X_test)
print(f"Alpha: {alpha}")
print(f"Linear Coefficients: {linear.coef_}")
print(f"Linear Intercept: {linear.intercept_}")
print(f"Linear R²: {r2_score(y_test, linear_pred):.6f}")
print(f"Lasso Coefficients: {lasso.coef_}")
print(f"Lasso Intercept: {lasso.intercept_}")
print(f"Lasso R²: {r2_score(y_test, lasso_pred):.6f}")
print(f"Ridge Coefficients: {ridge.coef_}")
print(f"Ridge Intercept: {ridge.intercept_}")
print(f"Ridge R²: {r2_score(y_test, ridge_pred):.6f}")
print(f"Elastic Net Coefficients: {elastic_net.coef_}")
print(f"Elastic Net Intercept: {elastic_net.intercept_}")
print(f"Elastic Net R²: {r2_score(y_test, elastic_net_pred):.6f}")Code Breakdown
This script establishes a baseline comparison between raw and engineered datasets using Scikit-Learn's production estimators. It evaluates Linear, Ridge, Lasso, and ElasticNet regressions across varying penalty strengths (alpha) to observe algorithmic convergence and feature selection.