Supervised learning is like teaching with a teacher supervising—a machine learning technique where models learn from labeled training data. It involves showing the model input-output pairs (like showing examples of fruits with labels) so it can learn to map inputs to correct outputs. The goal is for the model to make accurate predictions or classifications when given new, unseen data based on what it learned from the labeled examples. Two important techniques are regression and classification.
This image was created asking a popular AI-image generator to create
“Supervised Learning”
Regression: Linear regression, polynomial regression
Regression is a statistical method used in machine learning to predict continuous numerical outcomes based on input variables. It aims to find the relationship between independent variables and a dependent variable by fitting a mathematical equation to the data. Its goal is to understand how changes in the input variables impact the outcome, enabling predictions of future values within a range. The two fundamental types of regression are linear regression and polynomial regression.
- Linear Regression: It’s like drawing a straight line through scattered points on a graph. Imagine plotting points on a graph where one point represents one thing (let’s say the price of a house) and the other point represents another thing (like the size of the house). Linear regression helps find the line that best fits these points, allowing you to predict one thing based on the other. For instance, if you know the size of a house, linear regression can help predict its price based on the pattern seen in the plotted points.
- Polynomial Regression: Sometimes, a straight line doesn’t quite capture the relationship between things accurately. Polynomial regression is like fitting a curve instead of a straight line to those scattered points. Imagine bending the line on the graph to better match the points. This curve might capture more complex relationships between things. For instance, if you’re looking at how temperature affects ice cream sales, a curve might better show how sales increase with temperature but then decrease on extremely hot days (due to melting, for instance).
In Python, linear regression can be performed using libraries like scikit-learn
to create a simple linear regression model.
import numpy as np
from sklearn.linear_model import LinearRegression
import matplotlib.pyplot as plt
# Generating some random data for demonstration
np.random.seed(0)
X = 2 * np.random.rand(100, 1) # Generate 100 random values between 0 and 2
y = 4 + 3 * X + np.random.randn(100, 1) # y = 4 + 3X + noise
# Create a linear regression model
model = LinearRegression()
# Train the model
model.fit(X, y)
# Make predictions
X_new = np.array([[0], [2]]) # New data points for prediction (0 and 2)
y_pred = model.predict(X_new)
# Plotting the data and the linear regression line
plt.scatter(X, y, alpha=0.6, label='Data')
plt.plot(X_new, y_pred, 'r-', label='Linear Regression')
plt.xlabel('X')
plt.ylabel('y')
plt.legend()
plt.title('Linear Regression Example')
plt.show()
So, while linear regression fits a straight line, polynomial regression fits curves to find patterns in the data. Both help us predict and understand relationships between different things.
Classification: Logistic regression, decision trees, k-nearest neighbors, support vector machines
Classification in is like sorting items into different categories—it’s a technique where models learn to assign input data points into predefined classes or categories based on their characteristics. The goal is for the model to accurately classify new, unseen data into these predefined groups, such as identifying whether an email is spam or not, or classifying images of animals into different species based on their features. There exist multiple types of classification, of which the most fundamental ones are:
- Logistic Regression: Imagine you have two groups of things, like apples and oranges, and you want to draw a line between them on a graph. Logistic regression helps draw that line in a way that best separates the apples from the oranges. It’s like finding the best-fit line that tells you whether something belongs to one group or the other based on its characteristics. For example, if you have data about students and whether they pass or fail an exam, logistic regression can predict the likelihood of a student passing based on factors like study hours or previous grades.
- Decision Trees: Picture a flowchart with different questions leading to outcomes. Decision trees work just like that. They ask a series of questions about the characteristics of something (let’s say fruits) to classify it. For instance, to classify a fruit, the tree might ask if it’s red. If yes, it might ask about its size next. It keeps asking questions until it decides if it’s an apple or an orange. This method is like a step-by-step guide to sorting things based on their features.
- K-Nearest Neighbors (KNN): Imagine your friends live in houses scattered across a neighborhood, and you want to know who your closest friends are. KNN finds the closest “neighbors” (points) to what you’re trying to classify based on certain characteristics. It looks at the characteristics of the nearby points to decide where the new thing (let’s say a new house) belongs. For instance, if most nearby houses are painted yellow, KNN might predict that the new house will also be painted yellow.
- Support Vector Machines (SVM): Think of SVM as drawing a line or a boundary between different things, like drawing a fence between two types of animals in a zoo. It looks for the best possible line that separates these groups in the best way. This line should have the maximum distance from the closest points of each group. For instance, if you have data about cats and dogs based on their size and weight, SVM will draw a line that best separates these animals based on these features.
This is an example of how the K-Nearest Neighbors algorithm can be used in Python.
import numpy as np
import matplotlib.pyplot as plt
from sklearn.datasets import load_iris
from sklearn.neighbors import KNeighborsClassifier
# Load the Iris dataset
iris = load_iris()
X = iris.data[:, :2] # Consider only the first two features for visualization
y = iris.target
# Create a KNN classifier
knn = KNeighborsClassifier(n_neighbors=5)
knn.fit(X, y)
# Plotting decision boundary
x_min, x_max = X[:, 0].min() - 1, X[:, 0].max() + 1
y_min, y_max = X[:, 1].min() - 1, X[:, 1].max() + 1
xx, yy = np.meshgrid(np.arange(x_min, x_max, 0.1),
np.arange(y_min, y_max, 0.1))
Z = knn.predict(np.c_[xx.ravel(), yy.ravel()])
Z = Z.reshape(xx.shape)
plt.contourf(xx, yy, Z, alpha=0.4)
plt.scatter(X[:, 0], X[:, 1], c=y, s=20, edgecolor='k')
plt.xlabel('Sepal length')
plt.ylabel('Sepal width')
plt.title('K-Nearest Neighbors Decision Boundary (K=5)')
plt.show()
Each of these methods has its way of making decisions about classification, whether it’s drawing lines, asking questions, finding neighbors, or creating boundaries to categorize things based on their characteristics.