AI Logbook
Live Learning Feed

AI Logbook

Understanding intelligent systems from first principles.

Euclidean Distance: Measuring Similarity

Similarity and Nearest NeighborsThe Euclidean Distance FormulaImplementing Distance in the Vector Class

๐Ÿง The Theory

AI/ML Concept: Distance is Similarity

In the world of artificial intelligence, geometric distance is synonymous with similarity. Because we established yesterday that real-world objects are translated into feature vectors, we can now use math to compare them.

  • Recommendation Systems: If a user's viewing history is mapped as a vector, an AI can find other users who have the smallest Euclidean distance to them. It then recommends movies those similar users watched.
  • Search Engines: When you search for an image, the system converts your query into a vector and returns images whose mathematical distance is closest to your query.
  • Classification: If we have a new, unknown data point (like a medical test result) and its vector is geometrically closest to a cluster of vectors labeled "Healthy", the model predicts the new point is also "Healthy".

By calculating distance, our AI engine takes its first real step toward making decisions based on data.

๐Ÿ“The Math

Math: Euclidean Distance

To understand how far apart two points are in space, we use Euclidean distance. This is the exact straight-line distance between two points, acting as an nn-dimensional extension of the Pythagorean theorem (a2+b2=c2a^2 + b^2 = c^2).
ย 
For a 2D space, the distance dd between points (x1,y1)(x_1, y_1) and (x2,y2)(x_2, y_2) is:
โ€ƒโ€ƒโ€ƒd=(x2โˆ’x1)2+(y2โˆ’y1)2d = \sqrt{(x_2 - x_1)^2 + (y_2 - y_1)^2}
ย 
In machine learning, we rarely work in just 2 dimensions. The beauty of the math is that it scales perfectly to high-dimensional spaces. To find the distance between any two vectors xโƒ—\vec{x} and yโƒ—\vec{y}, we subtract their corresponding elements, square the differences (to remove negative values), sum them up, and take the square root:
โ€ƒโ€ƒโ€ƒd=โˆ‘i=1n(xiโˆ’yi)2d = \sqrt{\sum_{i=1}^{n} (x_i - y_i)^2}

โš™๏ธThe Code

import math

class Vector:
    def __init__(self, attributes: list[float]):
        self.attributes = attributes
    
    def __sub__(self, other: "Vector") -> "Vector":
        if isinstance(other, Vector):
            if len(self.attributes) != len(other.attributes):
                raise ValueError("Vectors must have the same dimension for subtraction.")
            return Vector([s - o for s, o in zip(self.attributes, other.attributes)])
        else:
            raise TypeError(f"Unsupported operand type for -: 'Vector' and '{type(other).__name__}'")


    def distance(self, other: "Vector") -> float:
        # 1. Calculate the difference vector
        diff = self - other

        # 2. Square each element in the difference vector
        squared_diff = [x ** 2 for x in diff.attributes]

        # 3. Sum the squared differences
        sum_squared_diff = sum(squared_diff)

        # 4. Take the square root of the sum
        distance = math.sqrt(sum_squared_diff)

        return distance

    def __repr__(self) -> str:
        attribute_str = ", ".join(f"{a:.2f}" for a in self.attributes)
        return f"Vector({attribute_str})"


# Example Usage: Finding similar users based on age, monthly spend, and site visits
user_A = Vector([25, 150.0, 12])
user_B = Vector([26, 145.0, 15]) # Very similar to A
user_C = Vector([55, 10.0, 2])   # Very different from A

dist_A_B = user_A.distance(user_B)
dist_A_C = user_A.distance(user_C)

print(f"Distance between User A and User B: {dist_A_B:.2f}")
print(f"Distance between User A and User C: {dist_A_C:.2f}")

Code Breakdown

  • import math: We import the standard math library to access the square root function, math.sqrt().
  • def distance(self, other): A new method in our Vector class to calculate how far apart two vector instances are.
  • diff = self - other: We leverage the __sub__ magic method we built in What is a Vector? Translating the Real World into Code This elegantly handles the (xiโˆ’yi)(x_i - y_i) portion of the formula and includes the dimension validation checks automatically.
  • sum(x**2 for x in diff.attributes): We iterate through the newly created difference vector, squaring each element x**2 to ensure all distances are positive, and then sum them together. This represents โˆ‘(xiโˆ’yi)2\sum (x_i - y_i)^2.
  • return math.sqrt(...): Finally, we take the square root of the total sum, completing the Euclidean distance formula.