Functions and Lines: Adding the Bias
In machine learning, we use the exact same equation as a straight line, but with slightly different terminology: .
(y-hat) represents our prediction.
represents our weights (the importance of each feature).
represents our features (the data).
is our bias (the intercept).
The Dot Product: Applying Importance with Weights
In machine learning, the dot product is how a model applies "importance" to different features. We call this level of importance a "weight".
Euclidean Distance: Measuring Similarity
Recommendation Systems: If a user's viewing history is mapped as a vector, an AI can find other users who have the smallest Euclidean distance to them. It then recommends movies those similar users watched.
Search Engines: When you search for an image, the system converts your query into a vector and returns images whose mathematical distance is closest to your query.
Classification: If we have a new, unknown data point (like a medical test result) and its vector is geometrically closest to a cluster of vectors labeled "Healthy", the model predicts the new point is also "Healthy".
What is a Vector? Translating the Real World into Code
Tabular Data: If we are building a model to predict house prices, we might define a house using three features: number of bedrooms, number of bathrooms, and age in years. A 3-bedroom, 2-bathroom house built 15 years ago becomes a data point in 3D space: .
Image Data: A grayscale image is represented as a vector where each element corresponds to the brightness of a single pixel.
Text Data: Words are mapped to high-dimensional vectors (often 300+ dimensions) where the numbers represent semantic meaning.