What is a Matrix? Representing Datasets
š§ The Theory
AI/ML Concept: The Design Matrix
In Week 1, we built an engine that could look at a single house (a vector) and predict a price. But in the real world, we don't train AI on one house at a time; we train it on thousands or millions of houses simultaneously.
To achieve this, your entire dataset is represented as a single matrix. We call this the Design Matrix and denote it with a capital .
The standard convention in AI is:
- Rows () represent individual samples (e.g., individual houses, individual users).
- Columns () represent the features (e.g., bedrooms, age, square footage).
If we have a dataset of 3 houses, and each house has 4 features, our dataset is a matrix.
| Bedrooms (Col 1) | Bathrooms (Col 2) | Age (Col 3) | SqFt (Col 4) | |
|---|---|---|---|---|
| House 1 (Row 1) | 3 | 2 | 15 | 2000 |
| House 2 (Row 2) | 4 | 3 | 10 | 2500 |
| House 3 (Row 3) | 2 | 1 | 50 | 1200 |
By structuring data this way, we can eventually use GPU acceleration to push this entire grid of numbers through our weights in a single mathematical operation, rather than using slow for loops.
šThe Math
Math: What is a Matrix?
If a vector is a 1D list of numbers, a Matrix is a 2D grid of numbers. You can think of a matrix as simply a collection of vectors stacked together.
In mathematics, we denote a matrix using a capital letter (like or ). The shape of a matrix is defined by its number of rows () and columns (). We call this an (read as "m by n") matrix.
For example, a matrix looks like this:
Just like vectors, matrices can exist in any dimension, and the mathematical rules we build will apply whether the matrix has 2 rows or 2 million rows.
āļøThe Code
class Matrix:
def __init__(self, data: list[list[float]]):
self.data = data
if data:
self.number_of_rows = len(data)
self.number_of_cols = len(data[0])
# Validation: Ensure all rows have the exact same number of columns
for row in data:
if len(row) != self.number_of_cols:
raise ValueError("All rows must have the same number of columns to form a valid matrix.")
else:
self.number_of_rows = 0
self.number_of_cols = 0
@property
def shape(self) -> tuple[int, int]:
"""Returns the shape of the matrix as (rows, columns)."""
return (self.number_of_rows, self.number_of_cols)
def __repr__(self) -> str:
"""Helper to print the matrix cleanly in the terminal."""
rows_str = "\n ".join(str(row) for row in self.data)
return f"Matrix(\n {rows_str}\n)"
# Example Usage: Creating a Design Matrix for 3 houses
house_dataset = [
[3.0, 2.0, 15.0, 2000.0], # House 1
[4.0, 3.0, 10.0, 2500.0], # House 2
[2.0, 1.0, 50.0, 1200.0] # House 3
]
X = Matrix(house_dataset)
print("Design Matrix X:")
print(X)
print(f"\nThe shape of X is: {X.shape}")
print(f"Number of samples (houses): {X.shape[0]}")
print(f"Number of features: {X.shape[1]}")Code Breakdown
class Matrix:We define our foundational 2D data structure.def __init__(self, data: list[list[float]]):The matrix is initialized with a list of lists.if len(row) != self.number_of_cols:This is a critical validation check. A mathematical matrix must be a perfect rectangle. If one row has 3 features and another has 4, the math will crash.@property def shape(self):In ML libraries like NumPy,.shapereturns the dimensions(rows, columns). We use the@propertydecorator so it can be accessed like an attribute (X.shape) rather than a method (X.shape()).