AI Logbook
Live Learning Feed

AI Logbook

Understanding intelligent systems from first principles.

Matrix-Matrix Multiplication: Deep Learning & Hidden Layers

Deep Learning & Hidden LayersMultiplying MatricesImplementing dot_matrix

๐Ÿง The Theory

AI/ML Concept: Deep Learning & Hidden Layers

Why do we need to multiply two matrices together? This is the exact mathematical operation that unlocks Deep Learning.

Up until now, our model only had one layer of weights. But neural networks have "hidden layers". Imagine we don't just want to predict a house's Price. Maybe our first layer of weights calculates three intermediate concepts: "Luxury Score", "Space Score", and "Location Score".

  • XX is our dataset matrix (e.g., 10001000 houses ร—\times 44 features).
  • W1W_1 is a matrix of weights mapping 44 features to 33 "Scores" (shape 4ร—34 \times 3).

When we calculate Xโ‹…W1X \cdot W_1, we get a new 1000ร—31000 \times 3 matrix. We have successfully transformed our entire dataset of 44 raw features into a new dataset of 33 high-level concepts! We can then pass that new matrix into a second layer of weights (W2W_2) to get our final price prediction. Matrix-matrix multiplication is how data flows forward through the multiple hidden layers of a deep neural network.

๐Ÿ“The Math

Math: Multiplying Matrices

If multiplying a matrix by a vector is just a series of dot products, multiplying a matrix by another matrix is simply taking that process into two dimensions.

To find the value for the first row and first column of your new matrix, you take the dot product of the 1st Row of Matrix A and the 1st Column of Matrix B.

Because we are pairing rows with columns, the golden rule of matrix multiplication is: The inner dimensions must match.

  • Matrix AA has shape (m,n)(m, n).
  • Matrix BB has shape (n,p)(n, p).
  • You can only multiply them if n=nn = n. The resulting Matrix CC will have the shape of the outer dimensions: (m,p)(m, p).

Mathematically, the element in row ii and column jj of the new matrix is calculated as:
Cij=โˆ‘k=1nAikBkjC_{ij} = \sum_{k=1}^{n} A_{ik} B_{kj}

โš™๏ธThe Code

class Matrix:
    def __init__(self, data: list[list[float]]):
        if data:
            self.__validate(data)
            self.data = data
            self.number_of_rows = len(data)
            self.number_of_cols = len(data[0])            
        else:
            self.data = []
            self.number_of_rows = 0
            self.number_of_cols = 0

    def __validate(self, data: list[list[float]]) -> None:
        """Private method to ensure matrix is a perfect rectangle."""
        number_of_cols = len(data[0])
        for row in data:
            if len(row) != number_of_cols:
                raise ValueError("All rows must have the same number of columns to form a valid matrix.")

    @property
    def shape(self) -> tuple[int, int]:
        """Returns the shape of the matrix as (rows, columns)."""
        return (self.number_of_rows, self.number_of_cols)
    
    def __mul__(self, scalar: float) -> "Matrix":
        """Scalar multiplication: scales every element by the scalar."""
        return Matrix([[element * scalar for element in row] for row in self.data])

    def __add__(self, other: "Matrix") -> "Matrix":
        """Matrix addition: adds elements of identically shaped matrices."""
        if isinstance(other, Matrix):
            if self.shape != other.shape:
                raise ValueError("Matrices must have the same shape for addition")
            return Matrix([
                [a + b for a, b in zip(row1, row2)]
                for row1, row2 in zip(self.data, other.data)
            ])
        else:
            raise TypeError(f"Unsupported operand type for +: 'Matrix' and '{type(other).__name__}'")
        
    def dot_vector(self, vector: list[float]) -> list[float]:
        """Multiplies the matrix by a 1D vector (Batch Dot Product)."""
        if self.number_of_cols != len(vector):
            raise ValueError("The number of columns in the matrix must exactly equal the number of elements in the vector")
        return [sum(a * b for a, b in zip(row, vector)) for row in self.data]
    
    def dot_matrix(self, other: "Matrix") -> "Matrix":
        """Multiplies the matrix by another matrix (Batch Matrix Multiplication)."""
        if self.number_of_cols != other.number_of_rows:
            raise ValueError("The number of columns in the first matrix must equal the number of rows in the second matrix for multiplication")
        
        result = [
            [
                sum(self.data[i][k] * other.data[k][j] for k in range(other.number_of_rows))
                for j in range(other.number_of_cols)
            ]
            for i in range(self.number_of_rows)
        ]
        
        return Matrix(result)

    def __repr__(self) -> str:
        """Helper to print the matrix cleanly in the terminal."""
        rows_str = "\n  ".join(str(row) for row in self.data)
        return f"Matrix(\n  {rows_str}\n)"
    
    
# --- Example Usage: Pushing Data through a Hidden Layer ---

# Dataset X: 2 Houses, 4 Features (Beds, Baths, Age, SqFt)
X = Matrix([
    [3.0, 2.0, 15.0, 2000.0],
    [4.0, 3.0, 10.0, 2500.0]
])

# Weight Matrix W1: Maps 4 input features to 2 hidden concepts ("Size Score", "Modernity Score")
# Shape must be (4, 2) so inner dimensions match X's (2, 4)
W1 = Matrix([
    [10.0, 0.0],   # Weights for Beds
    [5.0,  0.0],   # Weights for Baths
    [0.0, -2.0],   # Weights for Age
    [1.0,  0.0]    # Weights for SqFt
])

# Forward pass through the hidden layer: X * W1
hidden_layer_output = X.dot_matrix(W1)

print(f"Dataset Shape: {X.shape}")
print(f"Weights Shape: {W1.shape}")
print("\nHidden Layer Output (New Transformed Dataset):")
print(hidden_layer_output)
print(f"Output Shape: {hidden_layer_output.shape}")

Code Breakdown

  • def dot_matrix(self, other: "Matrix") -> "Matrix": We define the method to multiply our Matrix by another Matrix.
  • if self.number_of_cols != other.number_of_rows: The ultimate dimensionality check. A matrix of shape (m,n)(m, n) can only be multiplied by a matrix of shape (n,p)(n, p).
  • sum(self.data[i][k] * other.data[k][j] ...): This is the exact translation of the mathematical summation Cij=โˆ‘AikBkjC_{ij} = \sum A_{ik} B_{kj}. It computes the dot product of row i from the first matrix and column j from the second matrix.
  • for j in range(...) and for i in range(...): The outer loops build the new matrix CC, ensuring its final shape is (m,p)(m, p).