Building a Facial Emotion Recognition System
Objective
Build an end-to-end facial emotion recognition system that can detect faces in images/video streams and classify their emotional expressions. This project combines computer vision techniques with deep learning to create a robust emotion detection pipeline.
Learning Outcomes
By completing this project, you will:
- Master face detection and facial landmark extraction
- Implement CNN architectures for emotion classification
- Build real-time video processing pipelines
- Handle facial detection edge cases
- Deploy computer vision models for real-time inference
Skills Gained
- Building computer vision pipelines
- Implementing facial detection and tracking
- Creating emotion classification models
- Processing real-time video streams
- Deploying CV models in production
- Handling real-world imaging challenges
Tools Required
# Core libraries
pip install opencv-python
pip install dlib
pip install tensorflow
pip install face-recognition
# Additional utilities
pip install numpy
pip install moviepy
pip install streamlit # For GUI
Project Structure
emotion_recognition/
โ
โโโ data/
โ โโโ raw/
โ โ โโโ fer2013/
โ โ โโโ facial_landmarks/
โ โโโ processed/
โ
โโโ src/
โ โโโ face_detection.py
โ โโโ landmark_extraction.py
โ โโโ emotion_classifier.py
โ โโโ video_processor.py
โ
โโโ models/
โ โโโ saved_models/
โ
โโโ app/
โโโ streamlit_app.py
โโโ utils/
Prerequisites and Theoretical Foundations for โBuilding a Facial Emotion Recognition Systemโ
1. Intermediate Python Programming
- Data Manipulation: Proficiency with pandas for data manipulation and NumPy for numerical computations.
- Object-Oriented Programming (OOP): Understanding classes and objects to structure code effectively.
- Image Processing Libraries: Familiarity with OpenCV or PIL for image loading and preprocessing.
Click to view Python code examples
import pandas as pd
import numpy as np
from PIL import Image
# Reading image data
image = Image.open('face.jpg')
image_array = np.array(image)
# Basic data manipulation
df = pd.DataFrame({'emotion': ['happy', 'sad'], 'intensity': [0.9, 0.4]})
print(df.head())
2. Mathematics and Machine Learning Foundations
- Linear Algebra: Understanding of matrices, vectors, and tensor operations.
- Calculus: Basics of differentiation, gradients, and backpropagation.
- Probability and Statistics: Knowledge of probability distributions, mean, variance, and statistical measures.
- Machine Learning Concepts: Understanding supervised learning, classification tasks, overfitting, and regularization.
Click to view mathematical concepts with code
import numpy as np
# Mean and variance
data = np.array([1, 2, 3, 4, 5])
mean = np.mean(data)
variance = np.var(data)
# Softmax function
def softmax(x):
e_x = np.exp(x - np.max(x))
return e_x / e_x.sum(axis=0)
3. Deep Learning Concepts
- Convolutional Neural Networks (CNNs): Understanding of convolutions, kernels, feature maps, pooling layers, and activation functions.
- Transfer Learning: Knowledge of using pre-trained models and fine-tuning them for specific tasks.
- Regularization Techniques: Familiarity with dropout, batch normalization, and early stopping.
- Optimization Algorithms: Understanding of gradient descent, Adam optimizer, and learning rate scheduling.
Click to view deep learning concepts
-
Convolution Operation:
- Equation: ( (I * K)(x, y) = \sum_m \sum_n I(x - m, y - n) K(m, n) )
- Concept: Applying a kernel (filter) over an image to extract features.
-
Activation Functions:
- ReLU: ( f(x) = \max(0, x) )
- Leaky ReLU: ( f(x) = \max(0.01x, x) )
- Softmax: Used for multi-class classification to output probabilities.
-
Transfer Learning:
- Concept: Using a model trained on a large dataset and adapting it to a new, related task.
- Approach:
- Feature Extraction: Use the pre-trained model as a fixed feature extractor.
- Fine-Tuning: Unfreeze some top layers of the pre-trained model for re-training.
-
Regularization Techniques:
- Dropout: Randomly setting a fraction of input units to 0 during training to prevent overfitting.
- Batch Normalization: Normalizing inputs of each layer to stabilize learning.
- Early Stopping: Stopping training when validation performance stops improving.
4. Computer Vision Basics
- Image Preprocessing: Techniques like resizing, normalization, and data augmentation (rotation, flipping, cropping).
- Facial Feature Detection: Understanding of Haar cascades, Histogram of Oriented Gradients (HOG), or deep learning-based face detectors.
- Emotion Recognition Concepts: Awareness of facial expression features associated with emotions.
Click to view computer vision code examples
import cv2
# Load an image using OpenCV
image = cv2.imread('face.jpg')
# Convert to grayscale
gray_image = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
# Resize image
resized_image = cv2.resize(gray_image, (48, 48))
# Data augmentation example: horizontal flip
flipped_image = cv2.flip(resized_image, 1)
# Display image using OpenCV
cv2.imshow('Image', flipped_image)
cv2.waitKey(0)
cv2.destroyAllWindows()
5. Required Libraries
- TensorFlow or PyTorch: Deep learning frameworks for building and training neural networks.
- Keras: High-level API for TensorFlow.
- OpenCV: For image processing tasks.
- Matplotlib or Seaborn: For data visualization.
- Scikit-learn: For preprocessing, model evaluation, and classical ML algorithms.
pip install tensorflow opencv-python matplotlib seaborn scikit-learn
6. Dataset Familiarity
- FER2013 Dataset: A common dataset for facial emotion recognition.
- Contents: 28,709 training images and 7,178 test images of size 48x48 pixels.
- Classes: Anger, Disgust, Fear, Happy, Sad, Surprise, Neutral.
- Understanding Data Structure: How to load and preprocess datasets for training.
Click to view dataset loading example
import pandas as pd
import numpy as np
from PIL import Image
# Load the FER2013 dataset CSV file
data = pd.read_csv('fer2013.csv')
# Example of accessing image data and label
pixels = data['pixels'][0]
label = data['emotion'][0]
# Convert pixel string to image
pixel_array = np.array([int(p) for p in pixels.split()]).reshape(48, 48)
image = Image.fromarray(pixel_array.astype('uint8'), 'L')
image.show()
Steps and Tasks
1. Face Detection Pipeline
Implement robust face detection:
import cv2
import dlib
class FaceDetector:
def __init__(self):
self.face_detector = dlib.get_frontal_face_detector()
self.landmark_predictor = dlib.shape_predictor(
'models/shape_predictor_68_face_landmarks.dat'
)
def detect_faces(self, image):
"""Detect faces in image"""
# Convert to grayscale
gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
# Detect faces
faces = self.face_detector(gray)
# Extract face coordinates
face_coords = []
for face in faces:
x1, y1 = face.left(), face.top()
x2, y2 = face.right(), face.bottom()
face_coords.append((x1, y1, x2, y2))
return face_coords
def extract_landmarks(self, image, face):
"""Extract facial landmarks"""
landmarks = self.landmark_predictor(image, face)
return [(point.x, point.y) for point in landmarks.parts()]
Click to view advanced face detection
class AdvancedFaceDetector:
def __init__(self):
self.face_detector = dlib.get_frontal_face_detector()
self.landmark_predictor = dlib.shape_predictor(
'models/shape_predictor_68_face_landmarks.dat'
)
self.face_recognition_model = dlib.face_recognition_model_v1(
'models/dlib_face_recognition_resnet_model_v1.dat'
)
def detect_and_align(self, image):
"""Detect and align faces"""
# Detect faces
faces = self.detect_faces(image)
aligned_faces = []
for face in faces:
# Get landmarks
landmarks = self.extract_landmarks(image, face)
# Align face
aligned_face = dlib.get_face_chip(
image,
landmarks,
size=150
)
aligned_faces.append(aligned_face)
return aligned_faces
def get_face_encoding(self, face_image):
"""Get face embedding"""
return self.face_recognition_model.compute_face_descriptor(face_image)
def track_faces(self, video_source=0):
"""Track faces in video stream"""
cap = cv2.VideoCapture(video_source)
while True:
ret, frame = cap.read()
if not ret:
break
faces = self.detect_faces(frame)
# Draw face boxes
for face in faces:
x1, y1, x2, y2 = face
cv2.rectangle(
frame,
(x1, y1),
(x2, y2),
(0, 255, 0),
2
)
cv2.imshow('Face Tracking', frame)
if cv2.waitKey(1) & 0xFF == ord('q'):
break
cap.release()
cv2.destroyAllWindows()
2. Emotion Classification Model
Create emotion classification CNN:
import tensorflow as tf
from tensorflow.keras import layers, models
def create_emotion_cnn(input_shape=(48, 48, 1), num_classes=7):
"""Create CNN for emotion classification"""
model = models.Sequential([
layers.Conv2D(32, (3, 3), activation='relu', input_shape=input_shape),
layers.BatchNormalization(),
layers.MaxPooling2D((2, 2)),
layers.Conv2D(64, (3, 3), activation='relu'),
layers.BatchNormalization(),
layers.MaxPooling2D((2, 2)),
layers.Conv2D(128, (3, 3), activation='relu'),
layers.BatchNormalization(),
layers.MaxPooling2D((2, 2)),
layers.Flatten(),
layers.Dense(256, activation='relu'),
layers.Dropout(0.5),
layers.Dense(num_classes, activation='softmax')
])
return model
Click to view advanced model architectures
class EmotionRecognitionModel:
def __init__(self):
self.model = None
self.emotions = [
'angry', 'disgust', 'fear', 'happy',
'neutral', 'sad', 'surprise'
]
def build_resnet_model(self, input_shape=(48, 48, 1)):
"""Build ResNet-based model"""
base_model = tf.keras.applications.ResNet50(
include_top=False,
weights=None,
input_shape=input_shape
)
x = base_model.output
x = layers.GlobalAveragePooling2D()(x)
x = layers.Dense(1024, activation='relu')(x)
x = layers.Dropout(0.5)(x)
outputs = layers.Dense(len(self.emotions), activation='softmax')(x)
self.model = models.Model(base_model.input, outputs)
def build_efficient_model(self, input_shape=(48, 48, 1)):
"""Build EfficientNet-based model"""
base_model = tf.keras.applications.EfficientNetB0(
include_top=False,
weights=None,
input_shape=input_shape
)
x = base_model.output
x = layers.GlobalAveragePooling2D()(x)
x = layers.Dense(512, activation='relu')(x)
x = layers.Dropout(0.3)(x)
outputs = layers.Dense(len(self.emotions), activation='softmax')(x)
self.model = models.Model(base_model.input, outputs)
def train_model(self, train_data, val_data, epochs=50):
"""Train the emotion recognition model"""
if self.model is None:
raise ValueError("Model not initialized")
# Compile model
self.model.compile(
optimizer='adam',
loss='categorical_crossentropy',
metrics=['accuracy']
)
# Callbacks
callbacks = [
tf.keras.callbacks.EarlyStopping(
patience=5,
restore_best_weights=True
),
tf.keras.callbacks.ReduceLROnPlateau(
factor=0.2,
patience=3
)
]
# Train
history = self.model.fit(
train_data,
validation_data=val_data,
epochs=epochs,
callbacks=callbacks
)
return history
def predict_emotion(self, face_image):
"""Predict emotion from face image"""
# Preprocess image
face_image = cv2.resize(face_image, (48, 48))
face_image = face_image.astype('float32') / 255.0
face_image = np.expand_dims(face_image, axis=0)
# Make prediction
predictions = self.model.predict(face_image)
emotion_idx = np.argmax(predictions[0])
confidence = predictions[0][emotion_idx]
return self.emotions[emotion_idx], confidence
3. Real-time Video Processing
Implement real-time emotion detection:
import cv2
import numpy as np
from threading import Thread
import queue
class VideoProcessor:
def __init__(self, face_detector, emotion_classifier):
self.face_detector = face_detector
self.emotion_classifier = emotion_classifier
self.processing_queue = queue.Queue(maxsize=30)
def process_frame(self, frame):
"""Process single frame"""
# Detect faces
faces = self.face_detector.detect_faces(frame)
results = []
for (x1, y1, x2, y2) in faces:
# Extract face ROI
face_roi = frame[y1:y2, x1:x2]
# Predict emotion
emotion, confidence = self.emotion_classifier.predict_emotion(face_roi)
results.append({
'bbox': (x1, y1, x2, y2),
'emotion': emotion,
'confidence': confidence
})
return results
def start_video_stream(self, source=0):
"""Start real-time video processing"""
cap = cv2.VideoCapture(source)
while True:
ret, frame = cap.read()
if not ret:
break
# Process frame
results = self.process_frame(frame)
# Draw results
for result in results:
x1, y1, x2, y2 = result['bbox']
emotion = result['emotion']
conf = result['confidence']
# Draw bounding box
cv2.rectangle(frame, (x1, y1), (x2, y2), (0, 255, 0), 2)
# Draw emotion label
label = f"{emotion}: {conf:.2f}"
cv2.putText(
frame, label, (x1, y1-10),
cv2.FONT_HERSHEY_SIMPLEX, 0.6, (0, 255, 0), 2
)
cv2.imshow('Emotion Detection', frame)
if cv2.waitKey(1) & 0xFF == ord('q'):
break
cap.release()
cv2.destroyAllWindows()
Click to view advanced video processing
class AdvancedVideoProcessor:
def __init__(self, face_detector, emotion_classifier):
self.face_detector = face_detector
self.emotion_classifier = emotion_classifier
self.processing_queue = queue.Queue(maxsize=30)
self.result_queue = queue.Queue(maxsize=30)
self.is_running = False
def process_frames_worker(self):
"""Background worker for frame processing"""
while self.is_running:
try:
frame = self.processing_queue.get(timeout=1)
except queue.Empty:
continue
# Process frame
results = self.process_frame(frame)
self.result_queue.put((frame, results))
self.processing_queue.task_done()
def start_processing_thread(self):
"""Start background processing thread"""
self.is_running = True
self.process_thread = Thread(
target=self.process_frames_worker,
daemon=True
)
self.process_thread.start()
def stop_processing_thread(self):
"""Stop background processing"""
self.is_running = False
if hasattr(self, 'process_thread'):
self.process_thread.join()
def start_video_stream(self, source=0, display=True):
"""Start video stream with async processing"""
cap = cv2.VideoCapture(source)
self.start_processing_thread()
try:
while True:
ret, frame = cap.read()
if not ret:
break
# Add frame to processing queue
if not self.processing_queue.full():
self.processing_queue.put(frame)
# Get and display results
if not self.result_queue.empty():
processed_frame, results = self.result_queue.get()
if display:
self.display_results(processed_frame, results)
if display:
if cv2.waitKey(1) & 0xFF == ord('q'):
break
finally:
self.stop_processing_thread()
cap.release()
if display:
cv2.destroyAllWindows()
def display_results(self, frame, results):
"""Display detection results"""
# Draw emotion stats
stats = self.get_emotion_stats(results)
# Create stats display
stats_frame = np.zeros((150, 200, 3), dtype=np.uint8)
y_offset = 30
for emotion, count in stats.items():
text = f"{emotion}: {count}"
cv2.putText(
stats_frame, text, (10, y_offset),
cv2.FONT_HERSHEY_SIMPLEX, 0.5, (255, 255, 255), 1
)
y_offset += 20
# Draw results on frame
for result in results:
self.draw_detection(frame, result)
# Show frames
cv2.imshow('Emotion Detection', frame)
cv2.imshow('Statistics', stats_frame)
def get_emotion_stats(self, results):
"""Calculate emotion statistics"""
stats = {}
for result in results:
emotion = result['emotion']
stats[emotion] = stats.get(emotion, 0) + 1
return stats
4. GUI Implementation
Create interactive Streamlit interface:
import streamlit as st
import cv2
import tempfile
def create_emotion_detection_app():
st.title("Emotion Detection App")
# File uploader
uploaded_file = st.file_uploader(
"Choose an image/video file",
type=['jpg', 'jpeg', 'png', 'mp4']
)
if uploaded_file is not None:
# Determine file type
file_type = uploaded_file.type.split('/')[0]
if file_type == 'image':
process_image(uploaded_file)
else:
process_video(uploaded_file)
def process_image(image_file):
"""Process uploaded image"""
# Read image
file_bytes = np.asarray(bytearray(image_file.read()), dtype=np.uint8)
image = cv2.imdecode(file_bytes, cv2.IMREAD_COLOR)
# Create columns
col1, col2 = st.columns(2)
# Display original
with col1:
st.subheader("Original Image")
st.image(image, channels="BGR")
# Process and display results
with col2:
st.subheader("Detected Emotions")
results = processor.process_frame(image)
# Draw results
for result in results:
draw_detection(image, result)
st.image(image, channels="BGR")
Click to view advanced GUI implementation
class EmotionDetectionApp:
def __init__(self):
self.processor = VideoProcessor(
FaceDetector(),
EmotionClassifier()
)
def run(self):
"""Run Streamlit app"""
st.set_page_config(
page_title="Emotion Detection",
layout="wide"
)
# Sidebar
self.create_sidebar()
# Main content
st.title("Real-time Emotion Detection")
# Mode selection
mode = st.radio(
"Select Mode",
["Image Upload", "Video Upload", "Webcam"]
)
if mode == "Image Upload":
self.image_mode()
elif mode == "Video Upload":
self.video_mode()
else:
self.webcam_mode()
def create_sidebar(self):
"""Create sidebar with options"""
st.sidebar.title("Settings")
# Detection settings
st.sidebar.subheader("Detection Settings")
confidence_threshold = st.sidebar.slider(
"Confidence Threshold",
0.0, 1.0, 0.5
)
# Display settings
st.sidebar.subheader("Display Settings")
show_confidence = st.sidebar.checkbox("Show Confidence", True)
show_landmarks = st.sidebar.checkbox("Show Landmarks", False)
return {
'confidence_threshold': confidence_threshold,
'show_confidence': show_confidence,
'show_landmarks': show_landmarks
}
def webcam_mode(self):
"""Handle webcam input"""
ctx = st.empty()
if st.button("Start Webcam"):
cap = cv2.VideoCapture(0)
while True:
ret, frame = cap.read()
if not ret:
break
# Process frame
results = self.processor.process_frame(frame)
# Draw results
for result in results:
self.draw_detection(frame, result)
# Display frame
ctx.image(frame, channels="BGR")
# Check for stop button
if st.button("Stop"):
break
cap.release()
def process_video(self, video_file):
"""Process uploaded video"""
# Save uploaded file temporarily
tfile = tempfile.NamedTemporaryFile(delete=False)
tfile.write(video_file.read())
# Process video
cap = cv2.VideoCapture(tfile.name)
# Get video info
fps = int(cap.get(cv2.CAP_PROP_FPS))
frame_count = int(cap.get(cv2.CAP_PROP_FRAME_COUNT))
# Create progress bar
progress_bar = st.progress(0)
frame_placeholder = st.empty()
# Process frames
for i in range(frame_count):
ret, frame = cap.read()
if not ret:
break
# Process frame
results = self.processor.process_frame(frame)
# Draw results
for result in results:
self.draw_detection(frame, result)
# Update display
frame_placeholder.image(frame, channels="BGR")
# Update progress
progress = (i + 1) / frame_count
progress_bar.progress(progress)
cap.release()
5. Model Deployment
Create FastAPI service for emotion detection:
from fastapi import FastAPI, File, UploadFile
from fastapi.responses import JSONResponse
import io
import cv2
import numpy as np
app = FastAPI(title="Emotion Detection API")
@app.post("/detect-emotion")
async def detect_emotion(file: UploadFile = File(...)):
"""Detect emotions in uploaded image"""
try:
# Read image
contents = await file.read()
nparr = np.frombuffer(contents, np.uint8)
image = cv2.imdecode(nparr, cv2.IMREAD_COLOR)
# Process image
results = processor.process_frame(image)
# Format results
detections = [
{
"bbox": result['bbox'],
"emotion": result['emotion'],
"confidence": float(result['confidence'])
}
for result in results
]
return JSONResponse(content={"detections": detections})
except Exception as e:
return JSONResponse(
content={"error": str(e)},
status_code=500
)
Click to view advanced deployment setup
from fastapi import FastAPI, File, UploadFile, BackgroundTasks
from fastapi.middleware.cors import CORSMiddleware
import asyncio
import aioredis
import json
class EmotionDetectionService:
def __init__(self):
self.app = FastAPI(title="Emotion Detection API")
self.setup_middleware()
self.setup_routes()
self.processor = None
self.redis = None
async def startup(self):
"""Initialize services on startup"""
# Initialize model
self.processor = VideoProcessor(
FaceDetector(),
EmotionClassifier()
)
# Initialize Redis
self.redis = await aioredis.create_redis_pool('redis://localhost')
async def shutdown(self):
"""Cleanup on shutdown"""
if self.redis is not None:
self.redis.close()
await self.redis.wait_closed()
def setup_middleware(self):
"""Setup CORS and other middleware"""
self.app.add_middleware(
CORSMiddleware,
allow_origins=["*"],
allow_credentials=True,
allow_methods=["*"],
allow_headers=["*"],
)
def setup_routes(self):
"""Setup API routes"""
@self.app.post("/detect-emotion")
async def detect_emotion(
file: UploadFile = File(...),
background_tasks: BackgroundTasks
):
try:
# Generate job ID
job_id = str(uuid.uuid4())
# Add to processing queue
background_tasks.add_task(
self.process_image,
job_id,
await file.read()
)
return {"job_id": job_id}
except Exception as e:
return JSONResponse(
content={"error": str(e)},
status_code=500
)
@self.app.get("/result/{job_id}")
async def get_result(job_id: str):
"""Get processing result"""
result = await self.redis.get(job_id)
if result is None:
return {"status": "processing"}
return json.loads(result)
async def process_image(self, job_id: str, image_data: bytes):
"""Process image in background"""
try:
# Decode image
nparr = np.frombuffer(image_data, np.uint8)
image = cv2.imdecode(nparr, cv2.IMREAD_COLOR)
# Process image
results = self.processor.process_frame(image)
# Format results
detections = [
{
"bbox": result['bbox'],
"emotion": result['emotion'],
"confidence": float(result['confidence'])
}
for result in results
]
# Store results
await self.redis.set(
job_id,
json.dumps({"detections": detections}),
expire=3600 # Expire after 1 hour
)
except Exception as e:
await self.redis.set(
job_id,
json.dumps({"error": str(e)}),
expire=3600
)
# Create service instance
service = EmotionDetectionService()
# Add startup and shutdown events
@service.app.on_event("startup")
async def startup_event():
await service.startup()
@service.app.on_event("shutdown")
async def shutdown_event():
await service.shutdown()