Image Classification with CNNs: A Foundational Project
Objective
The main goal of this project is to build a basic image classification model using Convolutional Neural Networks (CNNs). You’ll work with a standard dataset to understand the fundamentals of image processing, CNN architecture, and model evaluation. This project serves as a stepping stone to more advanced computer vision projects.
Learning Outcomes
By completing this project, you will:
- Understand the basics of image classification and its applications
- Learn how to preprocess and augment image data
- Build and train a simple CNN from scratch using Keras and TensorFlow
- Implement data visualization techniques for image data
- Evaluate model performance using accuracy and loss metrics
- Gain foundational skills to progress to intermediate-level computer vision projects
Prerequisites and Theoretical Foundations
1. Python Programming Basics
- Variables, data types, and control structures
- Functions and modules
- Basic object-oriented programming
Click to view Python prerequisites code examples
# Variables and data types
number = 10
text = "Hello, World!"
# Control structures
for i in range(5):
print(f"Iteration {i}")
# Functions
def add(a, b):
return a + b
# Classes
class Greeting:
def __init__(self, name):
self.name = name
def say_hello(self):
return f"Hello, {self.name}!"
2. Basic Mathematics
- Algebra (equations, functions)
- Basic understanding of matrices and vectors
- Elementary statistics (mean, standard deviation)
Click to view mathematical concepts with code
import numpy as np
# Mean and standard deviation
data = [1, 2, 3, 4, 5]
mean = np.mean(data)
std_dev = np.std(data)
# Matrix operations
matrix_a = np.array([[1, 2], [3, 4]])
matrix_b = np.array([[5, 6], [7, 8]])
matrix_sum = matrix_a + matrix_b
matrix_product = np.dot(matrix_a, matrix_b)
3. Introduction to Machine Learning
- Understanding supervised learning
- Concepts of training and testing datasets
- Basic idea of model fitting and prediction
Click to view ML concepts with code
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
# Sample data
X = np.array([[1], [2], [3], [4], [5]])
y = np.array([5, 7, 9, 11, 13])
# Split data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)
# Model training
model = LinearRegression()
model.fit(X_train, y_train)
# Prediction
predictions = model.predict(X_test)
4. Basic Concepts in Deep Learning
- Understanding neural networks
- Activation functions (ReLU, sigmoid)
- Concept of epochs, batches, and iterations
- Loss functions and optimization
Click to view deep learning concepts
-
Neural Networks Basics
- Neuron: Basic unit computing weighted sum and activation.
- Layers: Input, hidden, and output layers.
- Forward Pass: Calculating outputs from inputs through the network.
-
Activation Functions
- ReLU (Rectified Linear Unit): ( f(x) = \max(0, x) )
- Sigmoid: ( f(x) = \frac{1}{1 + e^{-x}} )
-
Training Process
- Loss Function: Measures the difference between predicted and actual values.
- Example: Mean Squared Error (MSE)
- Optimizer: Algorithm to adjust weights to minimize loss.
- Example: Stochastic Gradient Descent (SGD)
- Loss Function: Measures the difference between predicted and actual values.
-
Key Terms
- Epoch: One complete pass through the training dataset.
- Batch Size: Number of samples processed before updating the model.
- Iteration: One update of the model’s parameters.
Skills Gained
- Building and training basic convolutional neural networks
- Preprocessing and augmenting image data
- Using TensorFlow and Keras for deep learning tasks
- Evaluating model performance using standard metrics
- Visualizing images and training results using Matplotlib
- Preparing for more advanced computer vision projects
Tools Required
-
Python 3.x
-
Jupyter Notebook or an IDE like Visual Studio Code
-
Libraries:
pip install tensorflow==2.x pip install matplotlib pip install numpy pip install pandas
Steps and Tasks
1. Data Acquisition and Exploration
We’ll use the MNIST dataset, a collection of 70,000 grayscale images of handwritten digits (0-9).
Load the Dataset:
from tensorflow.keras.datasets import mnist
# Load data
(X_train, y_train), (X_test, y_test) = mnist.load_data()
# Check the shape
print(f"Training data shape: {X_train.shape}, Training labels shape: {y_train.shape}")
print(f"Test data shape: {X_test.shape}, Test labels shape: {y_test.shape}")
Data Exploration:
Click to view data exploration code
import matplotlib.pyplot as plt
# Visualize some examples from the dataset
def plot_sample_images(X, y, num_samples=5):
plt.figure(figsize=(10, 2))
for i in range(num_samples):
plt.subplot(1, num_samples, i+1)
plt.imshow(X[i], cmap='gray')
plt.title(f"Label: {y[i]}")
plt.axis('off')
plt.show()
plot_sample_images(X_train, y_train)
2. Data Preprocessing
Prepare the data for training.
Tasks:
- Normalize the pixel values
- Reshape the data to include the channel dimension
- Convert labels to one-hot encoding
Implementation:
import numpy as np
from tensorflow.keras.utils import to_categorical
# Normalize pixel values
X_train = X_train / 255.0
X_test = X_test / 255.0
# Reshape data
X_train = X_train.reshape(-1, 28, 28, 1) # -1 means unspecified number of samples
X_test = X_test.reshape(-1, 28, 28, 1)
# One-hot encode labels
y_train = to_categorical(y_train)
y_test = to_categorical(y_test)
print(f"Updated training data shape: {X_train.shape}")
print(f"Updated training labels shape: {y_train.shape}")
3. Building the CNN Model
Create a simple CNN model using Keras.
Tasks:
- Define the model architecture
- Compile the model with appropriate loss function and optimizer
Implementation:
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Conv2D, MaxPooling2D, Flatten, Dense
# Define the model
model = Sequential()
# Convolutional layer
model.add(Conv2D(32, kernel_size=(3, 3), activation='relu', input_shape=(28, 28, 1)))
# Pooling layer
model.add(MaxPooling2D(pool_size=(2, 2)))
# Flatten layer
model.add(Flatten())
# Fully connected layer
model.add(Dense(128, activation='relu'))
# Output layer
model.add(Dense(10, activation='softmax'))
# Compile the model
model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])
# Model summary
model.summary()
4. Training the Model
Train the model on the training data.
Tasks:
- Fit the model
- Monitor training and validation accuracy and loss
Implementation:
# Train the model
history = model.fit(
X_train, y_train,
epochs=10,
batch_size=128,
validation_split=0.1
)
5. Evaluating the Model
Assess the model’s performance on the test data.
Implementation:
# Evaluate the model
test_loss, test_accuracy = model.evaluate(X_test, y_test)
print(f"Test accuracy: {test_accuracy:.4f}")
6. Visualizing Training Results
Plot the training and validation accuracy and loss over epochs.
Click to view visualization code
# Plot accuracy
plt.figure(figsize=(12, 4))
plt.subplot(1, 2, 1)
plt.plot(history.history['accuracy'], label='Train Accuracy')
plt.plot(history.history['val_accuracy'], label='Val Accuracy')
plt.title('Accuracy over epochs')
plt.xlabel('Epoch')
plt.ylabel('Accuracy')
plt.legend()
# Plot loss
plt.subplot(1, 2, 2)
plt.plot(history.history['loss'], label='Train Loss')
plt.plot(history.history['val_loss'], label='Val Loss')
plt.title('Loss over epochs')
plt.xlabel('Epoch')
plt.ylabel('Loss')
plt.legend()
plt.show()
7. Making Predictions
Use the trained model to make predictions on new data.
Implementation:
# Predict on test data
predictions = model.predict(X_test)
# Convert predictions to class labels
predicted_classes = np.argmax(predictions, axis=1)
true_classes = np.argmax(y_test, axis=1)
# Display some predictions
def display_predictions(X, y_true, y_pred, num_samples=5):
plt.figure(figsize=(10, 2))
for i in range(num_samples):
plt.subplot(1, num_samples, i+1)
plt.imshow(X[i].reshape(28, 28), cmap='gray')
plt.title(f"True: {y_true[i]}\nPred: {y_pred[i]}")
plt.axis('off')
plt.show()
display_predictions(X_test, true_classes, predicted_classes)
8. Saving and Loading the Model
Learn how to save your trained model for future use.
Implementation:
# Save the model
model.save('mnist_cnn_model.h5')
# Load the model
from tensorflow.keras.models import load_model
loaded_model = load_model('mnist_cnn_model.h5')
# Verify loaded model performance
loss, accuracy = loaded_model.evaluate(X_test, y_test)
print(f"Loaded model accuracy: {accuracy:.4f}")
9. Experimenting with Model Improvements
Explore ways to improve the model’s performance.
Suggestions:
- Add more convolutional and pooling layers
- Experiment with different activation functions
- Adjust the number of epochs and batch size
- Implement data augmentation using
ImageDataGenerator
Click to view data augmentation example
from tensorflow.keras.preprocessing.image import ImageDataGenerator
# Define data generator with augmentation
datagen = ImageDataGenerator(
rotation_range=10,
zoom_range=0.1,
width_shift_range=0.1,
height_shift_range=0.1
)
# Fit the generator to the training data
datagen.fit(X_train)
# Retrain the model using the data generator
history_aug = model.fit(
datagen.flow(X_train, y_train, batch_size=128),
epochs=10,
validation_data=(X_test, y_test)
)
10. Conclusion
In this project, you built a simple image classification model using a convolutional neural network. You learned how to preprocess image data, define a CNN architecture, train and evaluate the model, and visualize the results. This foundational project provides the essential skills needed to progress to intermediate-level computer vision projects, where you’ll work with more complex datasets and advanced techniques.
Next Steps and Improvements
To further enhance your skills and prepare for intermediate projects, consider the following:
- Work with Different Datasets: Try other datasets like CIFAR-10 or Fashion-MNIST to deal with colored images and more classes.
- Explore Advanced Architectures: Learn about deeper networks like VGG, ResNet, or MobileNet.
- Implement Transfer Learning: Use pre-trained models and fine-tune them for your specific tasks.
- Hyperparameter Tuning: Experiment with different optimizers, learning rates, and regularization techniques.
- Model Deployment: Learn how to deploy your model using Flask or TensorFlow Serving.