AI Logbook
Live Learning Feed

AI Logbook

Understanding intelligent systems from first principles.

The Geometry of Probability: The Sigmoid Function

The Artificial Neuron and Binary ClassificationExponential Bounding and Squashing FunctionsFloating-Point Overflow and Clipping

๐Ÿง The Theory

AI/ML Concept: Numerical Stability in Sigmoid and Robust Logistic Computation

๐Ÿงช Experimentation: Triggering the Overflow

When translating theoretical calculus into software, hardware limitations dictate architectural constraints.

The Vulnerability:
Passing a massive negative integer (e.g., z=โˆ’1000z = -1000) into the sigmoid function requires the CPU to calculate e1000e^{1000}. This number is astronomically large and exceeds the 64-bit floating-point memory limits of standard Python arrays, resulting in a RuntimeWarning: overflow encountered in exp.

The Engineering Fix:
Before the matrix reaches the exponential function, it must pass through a filter. Using np.clip(z, -250, 250) artificially limits the maximum and minimum values the exponent will ever process. Because ฯƒ(โˆ’250)\sigma(-250) is already infinitesimally close to 0.00.0, capping the input prevents memory overflow without degrading the mathematical precision of the probability output.

๐Ÿ”— Connection: The First Neuron

Where is this used?
The sigmoid function is the core operating mechanism of Logistic Regression. It is used in production systems to predict binary outcomes: e.g., Fraud/Not Fraud, Malignant/Benign, or System Failure/System Healthy.

Why does this matter?
A standard linear equation (Xwโƒ—+bX\vec{w} + b) wrapped inside a squashing function (ฯƒ\sigma) is the exact mathematical definition of an Artificial Neuron. Deep Learning networks are constructed by stacking thousands of these exact, computationally simple logistic regression units into interconnected layers. Mastering this local function is mastering the atomic unit of neural network architecture.

๐Ÿ“The Math

Math: The Sigmoid Function

Linear regression relies on the equation z=Xwโƒ—+bz = X\vec{w} + b. When applied to probability, this linear dot product fails because it outputs values extending towards negative and positive infinity, violating the foundational rule that probabilities must exist between 00 and 11.

The Sigmoid Function (ฯƒ\sigma) mathematically squashes any real number into a strict 0.00.0 to 1.01.0 boundary:
ฯƒ(z)=11+eโˆ’z\sigma(z) = \frac{1}{1 + e^{-z}}

Mathematical Limits:

  • As zโ†’โˆžz \to \infty, eโˆ’zโ†’0e^{-z} \to 0. The equation resolves to 11+0=1.0\frac{1}{1 + 0} = 1.0.
  • As zโ†’โˆ’โˆžz \to -\infty, eโˆ’zโ†’โˆže^{-z} \to \infty. The equation resolves to 11+โˆž=0.0\frac{1}{1 + \infty} = 0.0.
  • When z=0z = 0, e0=1e^{0} = 1. The equation resolves to 11+1=0.5\frac{1}{1 + 1} = 0.5 (The decision boundary).

โš™๏ธThe Code

import numpy as np

def sigmoid(z: np.ndarray) -> np.ndarray:
    z_extreme = np.clip(z, -250, 250)
    return np.round(1 / (1 + np.exp(-z_extreme)), 2)

# Test 1: The Bounds
z_normal = np.array([-10, 0, 10])
print("Normal Bounds:", sigmoid(z_normal))

# Test 2: Break Things (The Overflow)
z_extreme = np.array([-1000, 1000])
print("Extreme Bounds:", sigmoid(z_extreme))

Code Breakdown

This script defines the mathematical sigmoid function required to map raw linear outputs into bounded probabilities. It includes an explicit memory safety check (np.clip) to prevent numpy float overflows during extreme exponential calculations.