Euclidean Distance: Measuring Similarity
๐ง The Theory
AI/ML Concept: Distance is Similarity
In the world of artificial intelligence, geometric distance is synonymous with similarity. Because we established yesterday that real-world objects are translated into feature vectors, we can now use math to compare them.
- Recommendation Systems: If a user's viewing history is mapped as a vector, an AI can find other users who have the smallest Euclidean distance to them. It then recommends movies those similar users watched.
- Search Engines: When you search for an image, the system converts your query into a vector and returns images whose mathematical distance is closest to your query.
- Classification: If we have a new, unknown data point (like a medical test result) and its vector is geometrically closest to a cluster of vectors labeled "Healthy", the model predicts the new point is also "Healthy".
By calculating distance, our AI engine takes its first real step toward making decisions based on data.
๐The Math
Math: Euclidean Distance
To understand how far apart two points are in space, we use Euclidean distance. This is the exact straight-line distance between two points, acting as an -dimensional extension of the Pythagorean theorem ().
ย
For a 2D space, the distance between points and is:
โโโ
ย
In machine learning, we rarely work in just 2 dimensions. The beauty of the math is that it scales perfectly to high-dimensional spaces. To find the distance between any two vectors and , we subtract their corresponding elements, square the differences (to remove negative values), sum them up, and take the square root:
โโโ
โ๏ธThe Code
import math
class Vector:
def __init__(self, attributes: list[float]):
self.attributes = attributes
def __sub__(self, other: "Vector") -> "Vector":
if isinstance(other, Vector):
if len(self.attributes) != len(other.attributes):
raise ValueError("Vectors must have the same dimension for subtraction.")
return Vector([s - o for s, o in zip(self.attributes, other.attributes)])
else:
raise TypeError(f"Unsupported operand type for -: 'Vector' and '{type(other).__name__}'")
def distance(self, other: "Vector") -> float:
# 1. Calculate the difference vector
diff = self - other
# 2. Square each element in the difference vector
squared_diff = [x ** 2 for x in diff.attributes]
# 3. Sum the squared differences
sum_squared_diff = sum(squared_diff)
# 4. Take the square root of the sum
distance = math.sqrt(sum_squared_diff)
return distance
def __repr__(self) -> str:
attribute_str = ", ".join(f"{a:.2f}" for a in self.attributes)
return f"Vector({attribute_str})"
# Example Usage: Finding similar users based on age, monthly spend, and site visits
user_A = Vector([25, 150.0, 12])
user_B = Vector([26, 145.0, 15]) # Very similar to A
user_C = Vector([55, 10.0, 2]) # Very different from A
dist_A_B = user_A.distance(user_B)
dist_A_C = user_A.distance(user_C)
print(f"Distance between User A and User B: {dist_A_B:.2f}")
print(f"Distance between User A and User C: {dist_A_C:.2f}")Code Breakdown
import math: We import the standard math library to access the square root function,math.sqrt().def distance(self, other): A new method in ourVectorclass to calculate how far apart two vector instances are.diff = self - other: We leverage the__sub__magic method we built in What is a Vector? Translating the Real World into Code This elegantly handles the portion of the formula and includes the dimension validation checks automatically.sum(x**2 for x in diff.attributes): We iterate through the newly created difference vector, squaring each elementx**2to ensure all distances are positive, and then sum them together. This represents .return math.sqrt(...): Finally, we take the square root of the total sum, completing the Euclidean distance formula.