AI Logbook
Live Learning Feed

AI Logbook

Understanding intelligent systems from first principles.

What is a Matrix? Representing Datasets

The Design MatrixMatrices and DimensionsBuilding the Matrix Class

🧠The Theory

AI/ML Concept: The Design Matrix

In Week 1, we built an engine that could look at a single house (a vector) and predict a price. But in the real world, we don't train AI on one house at a time; we train it on thousands or millions of houses simultaneously.

To achieve this, your entire dataset is represented as a single matrix. We call this the Design Matrix and denote it with a capital XX.

The standard convention in AI is:

  • Rows (mm) represent individual samples (e.g., individual houses, individual users).
  • Columns (nn) represent the features (e.g., bedrooms, age, square footage).

If we have a dataset of 3 houses, and each house has 4 features, our dataset XX is a 3Ɨ43 \times 4 matrix.

Bedrooms (Col 1) Bathrooms (Col 2) Age (Col 3) SqFt (Col 4)
House 1 (Row 1) 3 2 15 2000
House 2 (Row 2) 4 3 10 2500
House 3 (Row 3) 2 1 50 1200

By structuring data this way, we can eventually use GPU acceleration to push this entire grid of numbers through our weights in a single mathematical operation, rather than using slow for loops.

šŸ“The Math

Math: What is a Matrix?

If a vector is a 1D list of numbers, a Matrix is a 2D grid of numbers. You can think of a matrix as simply a collection of vectors stacked together.

In mathematics, we denote a matrix using a capital letter (like AA or XX). The shape of a matrix is defined by its number of rows (mm) and columns (nn). We call this an mƗnm \times n (read as "m by n") matrix.

For example, a 3Ɨ23 \times 2 matrix looks like this:
X=[123456]X = \begin{bmatrix} 1 & 2 \\ 3 & 4 \\ 5 & 6 \end{bmatrix}

Just like vectors, matrices can exist in any dimension, and the mathematical rules we build will apply whether the matrix has 2 rows or 2 million rows.

āš™ļøThe Code

class Matrix:
    def __init__(self, data: list[list[float]]):
        self.data = data
        if data:
            self.number_of_rows = len(data)
            self.number_of_cols = len(data[0])
            
            # Validation: Ensure all rows have the exact same number of columns
            for row in data:
                if len(row) != self.number_of_cols:
                    raise ValueError("All rows must have the same number of columns to form a valid matrix.")
        else:
            self.number_of_rows = 0
            self.number_of_cols = 0

    @property
    def shape(self) -> tuple[int, int]:
        """Returns the shape of the matrix as (rows, columns)."""
        return (self.number_of_rows, self.number_of_cols)

    def __repr__(self) -> str:
        """Helper to print the matrix cleanly in the terminal."""
        rows_str = "\n  ".join(str(row) for row in self.data)
        return f"Matrix(\n  {rows_str}\n)"


# Example Usage: Creating a Design Matrix for 3 houses

house_dataset = [
    [3.0, 2.0, 15.0, 2000.0], # House 1
    [4.0, 3.0, 10.0, 2500.0], # House 2
    [2.0, 1.0, 50.0, 1200.0]  # House 3
]

X = Matrix(house_dataset)

print("Design Matrix X:")
print(X)
print(f"\nThe shape of X is: {X.shape}")
print(f"Number of samples (houses): {X.shape[0]}")
print(f"Number of features: {X.shape[1]}")

Code Breakdown

  • class Matrix: We define our foundational 2D data structure.
  • def __init__(self, data: list[list[float]]): The matrix is initialized with a list of lists.
  • if len(row) != self.number_of_cols: This is a critical validation check. A mathematical matrix must be a perfect rectangle. If one row has 3 features and another has 4, the math will crash.
  • @property def shape(self): In ML libraries like NumPy, .shape returns the dimensions (rows, columns). We use the @property decorator so it can be accessed like an attribute (X.shape) rather than a method (X.shape()).