🟡 Building a Facial Emotion Recognition System

stemaway · October 28, 2024, 5:17am

Building a Facial Emotion Recognition System

Objective

Build an end-to-end facial emotion recognition system that can detect faces in images/video streams and classify their emotional expressions. This project combines computer vision techniques with deep learning to create a robust emotion detection pipeline.

Learning Outcomes

By completing this project, you will:

Master face detection and facial landmark extraction
Implement CNN architectures for emotion classification
Build real-time video processing pipelines
Handle facial detection edge cases
Deploy computer vision models for real-time inference

Skills Gained

Building computer vision pipelines
Implementing facial detection and tracking
Creating emotion classification models
Processing real-time video streams
Deploying CV models in production
Handling real-world imaging challenges

Tools Required

# Core libraries
pip install opencv-python
pip install dlib
pip install tensorflow
pip install face-recognition

# Additional utilities
pip install numpy
pip install moviepy
pip install streamlit  # For GUI

Project Structure

emotion_recognition/
│
├── data/
│   ├── raw/
│   │   ├── fer2013/
│   │   └── facial_landmarks/
│   └── processed/
│
├── src/
│   ├── face_detection.py
│   ├── landmark_extraction.py
│   ├── emotion_classifier.py
│   └── video_processor.py
│
├── models/
│   └── saved_models/
│
└── app/
    ├── streamlit_app.py
    └── utils/

Prerequisites and Theoretical Foundations for “Building a Facial Emotion Recognition System”

1. Intermediate Python Programming

Data Manipulation: Proficiency with pandas for data manipulation and NumPy for numerical computations.
Object-Oriented Programming (OOP): Understanding classes and objects to structure code effectively.
Image Processing Libraries: Familiarity with OpenCV or PIL for image loading and preprocessing.

Click to view Python code examples

import pandas as pd
import numpy as np
from PIL import Image

# Reading image data
image = Image.open('face.jpg')
image_array = np.array(image)

# Basic data manipulation
df = pd.DataFrame({'emotion': ['happy', 'sad'], 'intensity': [0.9, 0.4]})
print(df.head())

2. Mathematics and Machine Learning Foundations

Linear Algebra: Understanding of matrices, vectors, and tensor operations.
Calculus: Basics of differentiation, gradients, and backpropagation.
Probability and Statistics: Knowledge of probability distributions, mean, variance, and statistical measures.
Machine Learning Concepts: Understanding supervised learning, classification tasks, overfitting, and regularization.

Click to view mathematical concepts with code

import numpy as np

# Mean and variance
data = np.array([1, 2, 3, 4, 5])
mean = np.mean(data)
variance = np.var(data)

# Softmax function
def softmax(x):
    e_x = np.exp(x - np.max(x))
    return e_x / e_x.sum(axis=0)

3. Deep Learning Concepts

Convolutional Neural Networks (CNNs): Understanding of convolutions, kernels, feature maps, pooling layers, and activation functions.
Transfer Learning: Knowledge of using pre-trained models and fine-tuning them for specific tasks.
Regularization Techniques: Familiarity with dropout, batch normalization, and early stopping.
Optimization Algorithms: Understanding of gradient descent, Adam optimizer, and learning rate scheduling.

Click to view deep learning concepts

Convolution Operation:
- Equation: ( (I * K)(x, y) = \sum_m \sum_n I(x - m, y - n) K(m, n) )
- Concept: Applying a kernel (filter) over an image to extract features.
Activation Functions:
- ReLU: ( f(x) = \max(0, x) )
- Leaky ReLU: ( f(x) = \max(0.01x, x) )
- Softmax: Used for multi-class classification to output probabilities.
Transfer Learning:
- Concept: Using a model trained on a large dataset and adapting it to a new, related task.
- Approach:
  - Feature Extraction: Use the pre-trained model as a fixed feature extractor.
  - Fine-Tuning: Unfreeze some top layers of the pre-trained model for re-training.
Regularization Techniques:
- Dropout: Randomly setting a fraction of input units to 0 during training to prevent overfitting.
- Batch Normalization: Normalizing inputs of each layer to stabilize learning.
- Early Stopping: Stopping training when validation performance stops improving.

4. Computer Vision Basics

Image Preprocessing: Techniques like resizing, normalization, and data augmentation (rotation, flipping, cropping).
Facial Feature Detection: Understanding of Haar cascades, Histogram of Oriented Gradients (HOG), or deep learning-based face detectors.
Emotion Recognition Concepts: Awareness of facial expression features associated with emotions.

Click to view computer vision code examples

import cv2

# Load an image using OpenCV
image = cv2.imread('face.jpg')

# Convert to grayscale
gray_image = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)

# Resize image
resized_image = cv2.resize(gray_image, (48, 48))

# Data augmentation example: horizontal flip
flipped_image = cv2.flip(resized_image, 1)

# Display image using OpenCV
cv2.imshow('Image', flipped_image)
cv2.waitKey(0)
cv2.destroyAllWindows()

5. Required Libraries

TensorFlow or PyTorch: Deep learning frameworks for building and training neural networks.
Keras: High-level API for TensorFlow.
OpenCV: For image processing tasks.
Matplotlib or Seaborn: For data visualization.
Scikit-learn: For preprocessing, model evaluation, and classical ML algorithms.

pip install tensorflow opencv-python matplotlib seaborn scikit-learn

6. Dataset Familiarity

FER2013 Dataset: A common dataset for facial emotion recognition.
- Contents: 28,709 training images and 7,178 test images of size 48x48 pixels.
- Classes: Anger, Disgust, Fear, Happy, Sad, Surprise, Neutral.
Understanding Data Structure: How to load and preprocess datasets for training.

Click to view dataset loading example

import pandas as pd
import numpy as np
from PIL import Image

# Load the FER2013 dataset CSV file
data = pd.read_csv('fer2013.csv')

# Example of accessing image data and label
pixels = data['pixels'][0]
label = data['emotion'][0]

# Convert pixel string to image
pixel_array = np.array([int(p) for p in pixels.split()]).reshape(48, 48)
image = Image.fromarray(pixel_array.astype('uint8'), 'L')
image.show()

Steps and Tasks

1. Face Detection Pipeline

Implement robust face detection:

import cv2
import dlib

class FaceDetector:
    def __init__(self):
        self.face_detector = dlib.get_frontal_face_detector()
        self.landmark_predictor = dlib.shape_predictor(
            'models/shape_predictor_68_face_landmarks.dat'
        )
        
    def detect_faces(self, image):
        """Detect faces in image"""
        # Convert to grayscale
        gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
        
        # Detect faces
        faces = self.face_detector(gray)
        
        # Extract face coordinates
        face_coords = []
        for face in faces:
            x1, y1 = face.left(), face.top()
            x2, y2 = face.right(), face.bottom()
            face_coords.append((x1, y1, x2, y2))
            
        return face_coords
    
    def extract_landmarks(self, image, face):
        """Extract facial landmarks"""
        landmarks = self.landmark_predictor(image, face)
        return [(point.x, point.y) for point in landmarks.parts()]

Click to view advanced face detection

class AdvancedFaceDetector:
    def __init__(self):
        self.face_detector = dlib.get_frontal_face_detector()
        self.landmark_predictor = dlib.shape_predictor(
            'models/shape_predictor_68_face_landmarks.dat'
        )
        self.face_recognition_model = dlib.face_recognition_model_v1(
            'models/dlib_face_recognition_resnet_model_v1.dat'
        )
        
    def detect_and_align(self, image):
        """Detect and align faces"""
        # Detect faces
        faces = self.detect_faces(image)
        
        aligned_faces = []
        for face in faces:
            # Get landmarks
            landmarks = self.extract_landmarks(image, face)
            
            # Align face
            aligned_face = dlib.get_face_chip(
                image,
                landmarks,
                size=150
            )
            aligned_faces.append(aligned_face)
            
        return aligned_faces
    
    def get_face_encoding(self, face_image):
        """Get face embedding"""
        return self.face_recognition_model.compute_face_descriptor(face_image)
    
    def track_faces(self, video_source=0):
        """Track faces in video stream"""
        cap = cv2.VideoCapture(video_source)
        
        while True:
            ret, frame = cap.read()
            if not ret:
                break
                
            faces = self.detect_faces(frame)
            
            # Draw face boxes
            for face in faces:
                x1, y1, x2, y2 = face
                cv2.rectangle(
                    frame,
                    (x1, y1),
                    (x2, y2),
                    (0, 255, 0),
                    2
                )
                
            cv2.imshow('Face Tracking', frame)
            
            if cv2.waitKey(1) & 0xFF == ord('q'):
                break
                
        cap.release()
        cv2.destroyAllWindows()

2. Emotion Classification Model

Create emotion classification CNN:

import tensorflow as tf
from tensorflow.keras import layers, models

def create_emotion_cnn(input_shape=(48, 48, 1), num_classes=7):
    """Create CNN for emotion classification"""
    model = models.Sequential([
        layers.Conv2D(32, (3, 3), activation='relu', input_shape=input_shape),
        layers.BatchNormalization(),
        layers.MaxPooling2D((2, 2)),
        
        layers.Conv2D(64, (3, 3), activation='relu'),
        layers.BatchNormalization(),
        layers.MaxPooling2D((2, 2)),
        
        layers.Conv2D(128, (3, 3), activation='relu'),
        layers.BatchNormalization(),
        layers.MaxPooling2D((2, 2)),
        
        layers.Flatten(),
        layers.Dense(256, activation='relu'),
        layers.Dropout(0.5),
        layers.Dense(num_classes, activation='softmax')
    ])
    
    return model

Click to view advanced model architectures

class EmotionRecognitionModel:
    def __init__(self):
        self.model = None
        self.emotions = [
            'angry', 'disgust', 'fear', 'happy',
            'neutral', 'sad', 'surprise'
        ]
        
    def build_resnet_model(self, input_shape=(48, 48, 1)):
        """Build ResNet-based model"""
        base_model = tf.keras.applications.ResNet50(
            include_top=False,
            weights=None,
            input_shape=input_shape
        )
        
        x = base_model.output
        x = layers.GlobalAveragePooling2D()(x)
        x = layers.Dense(1024, activation='relu')(x)
        x = layers.Dropout(0.5)(x)
        outputs = layers.Dense(len(self.emotions), activation='softmax')(x)
        
        self.model = models.Model(base_model.input, outputs)
        
    def build_efficient_model(self, input_shape=(48, 48, 1)):
        """Build EfficientNet-based model"""
        base_model = tf.keras.applications.EfficientNetB0(
            include_top=False,
            weights=None,
            input_shape=input_shape
        )
        
        x = base_model.output
        x = layers.GlobalAveragePooling2D()(x)
        x = layers.Dense(512, activation='relu')(x)
        x = layers.Dropout(0.3)(x)
        outputs = layers.Dense(len(self.emotions), activation='softmax')(x)
        
        self.model = models.Model(base_model.input, outputs)
    
    def train_model(self, train_data, val_data, epochs=50):
        """Train the emotion recognition model"""
        if self.model is None:
            raise ValueError("Model not initialized")
            
        # Compile model
        self.model.compile(
            optimizer='adam',
            loss='categorical_crossentropy',
            metrics=['accuracy']
        )
        
        # Callbacks
        callbacks = [
            tf.keras.callbacks.EarlyStopping(
                patience=5,
                restore_best_weights=True
            ),
            tf.keras.callbacks.ReduceLROnPlateau(
                factor=0.2,
                patience=3
            )
        ]
        
        # Train
        history = self.model.fit(
            train_data,
            validation_data=val_data,
            epochs=epochs,
            callbacks=callbacks
        )
        
        return history
    
    def predict_emotion(self, face_image):
        """Predict emotion from face image"""
        # Preprocess image
        face_image = cv2.resize(face_image, (48, 48))
        face_image = face_image.astype('float32') / 255.0
        face_image = np.expand_dims(face_image, axis=0)
        
        # Make prediction
        predictions = self.model.predict(face_image)
        emotion_idx = np.argmax(predictions[0])
        confidence = predictions[0][emotion_idx]
        
        return self.emotions[emotion_idx], confidence

3. Real-time Video Processing

Implement real-time emotion detection:

import cv2
import numpy as np
from threading import Thread
import queue

class VideoProcessor:
    def __init__(self, face_detector, emotion_classifier):
        self.face_detector = face_detector
        self.emotion_classifier = emotion_classifier
        self.processing_queue = queue.Queue(maxsize=30)
        
    def process_frame(self, frame):
        """Process single frame"""
        # Detect faces
        faces = self.face_detector.detect_faces(frame)
        
        results = []
        for (x1, y1, x2, y2) in faces:
            # Extract face ROI
            face_roi = frame[y1:y2, x1:x2]
            
            # Predict emotion
            emotion, confidence = self.emotion_classifier.predict_emotion(face_roi)
            
            results.append({
                'bbox': (x1, y1, x2, y2),
                'emotion': emotion,
                'confidence': confidence
            })
            
        return results
    
    def start_video_stream(self, source=0):
        """Start real-time video processing"""
        cap = cv2.VideoCapture(source)
        
        while True:
            ret, frame = cap.read()
            if not ret:
                break
                
            # Process frame
            results = self.process_frame(frame)
            
            # Draw results
            for result in results:
                x1, y1, x2, y2 = result['bbox']
                emotion = result['emotion']
                conf = result['confidence']
                
                # Draw bounding box
                cv2.rectangle(frame, (x1, y1), (x2, y2), (0, 255, 0), 2)
                
                # Draw emotion label
                label = f"{emotion}: {conf:.2f}"
                cv2.putText(
                    frame, label, (x1, y1-10),
                    cv2.FONT_HERSHEY_SIMPLEX, 0.6, (0, 255, 0), 2
                )
            
            cv2.imshow('Emotion Detection', frame)
            
            if cv2.waitKey(1) & 0xFF == ord('q'):
                break
                
        cap.release()
        cv2.destroyAllWindows()

Click to view advanced video processing

class AdvancedVideoProcessor:
    def __init__(self, face_detector, emotion_classifier):
        self.face_detector = face_detector
        self.emotion_classifier = emotion_classifier
        self.processing_queue = queue.Queue(maxsize=30)
        self.result_queue = queue.Queue(maxsize=30)
        self.is_running = False
        
    def process_frames_worker(self):
        """Background worker for frame processing"""
        while self.is_running:
            try:
                frame = self.processing_queue.get(timeout=1)
            except queue.Empty:
                continue
                
            # Process frame
            results = self.process_frame(frame)
            self.result_queue.put((frame, results))
            
            self.processing_queue.task_done()
            
    def start_processing_thread(self):
        """Start background processing thread"""
        self.is_running = True
        self.process_thread = Thread(
            target=self.process_frames_worker,
            daemon=True
        )
        self.process_thread.start()
        
    def stop_processing_thread(self):
        """Stop background processing"""
        self.is_running = False
        if hasattr(self, 'process_thread'):
            self.process_thread.join()
            
    def start_video_stream(self, source=0, display=True):
        """Start video stream with async processing"""
        cap = cv2.VideoCapture(source)
        self.start_processing_thread()
        
        try:
            while True:
                ret, frame = cap.read()
                if not ret:
                    break
                    
                # Add frame to processing queue
                if not self.processing_queue.full():
                    self.processing_queue.put(frame)
                
                # Get and display results
                if not self.result_queue.empty():
                    processed_frame, results = self.result_queue.get()
                    
                    if display:
                        self.display_results(processed_frame, results)
                
                if display:
                    if cv2.waitKey(1) & 0xFF == ord('q'):
                        break
                        
        finally:
            self.stop_processing_thread()
            cap.release()
            if display:
                cv2.destroyAllWindows()
                
    def display_results(self, frame, results):
        """Display detection results"""
        # Draw emotion stats
        stats = self.get_emotion_stats(results)
        
        # Create stats display
        stats_frame = np.zeros((150, 200, 3), dtype=np.uint8)
        
        y_offset = 30
        for emotion, count in stats.items():
            text = f"{emotion}: {count}"
            cv2.putText(
                stats_frame, text, (10, y_offset),
                cv2.FONT_HERSHEY_SIMPLEX, 0.5, (255, 255, 255), 1
            )
            y_offset += 20
            
        # Draw results on frame
        for result in results:
            self.draw_detection(frame, result)
            
        # Show frames
        cv2.imshow('Emotion Detection', frame)
        cv2.imshow('Statistics', stats_frame)
        
    def get_emotion_stats(self, results):
        """Calculate emotion statistics"""
        stats = {}
        for result in results:
            emotion = result['emotion']
            stats[emotion] = stats.get(emotion, 0) + 1
        return stats

4. GUI Implementation

Create interactive Streamlit interface:

import streamlit as st
import cv2
import tempfile

def create_emotion_detection_app():
    st.title("Emotion Detection App")
    
    # File uploader
    uploaded_file = st.file_uploader(
        "Choose an image/video file",
        type=['jpg', 'jpeg', 'png', 'mp4']
    )
    
    if uploaded_file is not None:
        # Determine file type
        file_type = uploaded_file.type.split('/')[0]
        
        if file_type == 'image':
            process_image(uploaded_file)
        else:
            process_video(uploaded_file)
            
def process_image(image_file):
    """Process uploaded image"""
    # Read image
    file_bytes = np.asarray(bytearray(image_file.read()), dtype=np.uint8)
    image = cv2.imdecode(file_bytes, cv2.IMREAD_COLOR)
    
    # Create columns
    col1, col2 = st.columns(2)
    
    # Display original
    with col1:
        st.subheader("Original Image")
        st.image(image, channels="BGR")
        
    # Process and display results
    with col2:
        st.subheader("Detected Emotions")
        results = processor.process_frame(image)
        
        # Draw results
        for result in results:
            draw_detection(image, result)
            
        st.image(image, channels="BGR")

Click to view advanced GUI implementation

class EmotionDetectionApp:
    def __init__(self):
        self.processor = VideoProcessor(
            FaceDetector(),
            EmotionClassifier()
        )
        
    def run(self):
        """Run Streamlit app"""
        st.set_page_config(
            page_title="Emotion Detection",
            layout="wide"
        )
        
        # Sidebar
        self.create_sidebar()
        
        # Main content
        st.title("Real-time Emotion Detection")
        
        # Mode selection
        mode = st.radio(
            "Select Mode",
            ["Image Upload", "Video Upload", "Webcam"]
        )
        
        if mode == "Image Upload":
            self.image_mode()
        elif mode == "Video Upload":
            self.video_mode()
        else:
            self.webcam_mode()
            
    def create_sidebar(self):
        """Create sidebar with options"""
        st.sidebar.title("Settings")
        
        # Detection settings
        st.sidebar.subheader("Detection Settings")
        confidence_threshold = st.sidebar.slider(
            "Confidence Threshold",
            0.0, 1.0, 0.5
        )
        
        # Display settings
        st.sidebar.subheader("Display Settings")
        show_confidence = st.sidebar.checkbox("Show Confidence", True)
        show_landmarks = st.sidebar.checkbox("Show Landmarks", False)
        
        return {
            'confidence_threshold': confidence_threshold,
            'show_confidence': show_confidence,
            'show_landmarks': show_landmarks
        }
        
    def webcam_mode(self):
        """Handle webcam input"""
        ctx = st.empty()
        
        if st.button("Start Webcam"):
            cap = cv2.VideoCapture(0)
            
            while True:
                ret, frame = cap.read()
                if not ret:
                    break
                    
                # Process frame
                results = self.processor.process_frame(frame)
                
                # Draw results
                for result in results:
                    self.draw_detection(frame, result)
                    
                # Display frame
                ctx.image(frame, channels="BGR")
                
                # Check for stop button
                if st.button("Stop"):
                    break
                    
            cap.release()
            
    def process_video(self, video_file):
        """Process uploaded video"""
        # Save uploaded file temporarily
        tfile = tempfile.NamedTemporaryFile(delete=False)
        tfile.write(video_file.read())
        
        # Process video
        cap = cv2.VideoCapture(tfile.name)
        
        # Get video info
        fps = int(cap.get(cv2.CAP_PROP_FPS))
        frame_count = int(cap.get(cv2.CAP_PROP_FRAME_COUNT))
        
        # Create progress bar
        progress_bar = st.progress(0)
        frame_placeholder = st.empty()
        
        # Process frames
        for i in range(frame_count):
            ret, frame = cap.read()
            if not ret:
                break
                
            # Process frame
            results = self.processor.process_frame(frame)
            
            # Draw results
            for result in results:
                self.draw_detection(frame, result)
                
            # Update display
            frame_placeholder.image(frame, channels="BGR")
            
            # Update progress
            progress = (i + 1) / frame_count
            progress_bar.progress(progress)
            
        cap.release()

5. Model Deployment

Create FastAPI service for emotion detection:

from fastapi import FastAPI, File, UploadFile
from fastapi.responses import JSONResponse
import io
import cv2
import numpy as np

app = FastAPI(title="Emotion Detection API")

@app.post("/detect-emotion")
async def detect_emotion(file: UploadFile = File(...)):
    """Detect emotions in uploaded image"""
    try:
        # Read image
        contents = await file.read()
        nparr = np.frombuffer(contents, np.uint8)
        image = cv2.imdecode(nparr, cv2.IMREAD_COLOR)
        
        # Process image
        results = processor.process_frame(image)
        
        # Format results
        detections = [
            {
                "bbox": result['bbox'],
                "emotion": result['emotion'],
                "confidence": float(result['confidence'])
            }
            for result in results
        ]
        
        return JSONResponse(content={"detections": detections})
        
    except Exception as e:
        return JSONResponse(
            content={"error": str(e)},
            status_code=500
        )

Click to view advanced deployment setup

from fastapi import FastAPI, File, UploadFile, BackgroundTasks
from fastapi.middleware.cors import CORSMiddleware
import asyncio
import aioredis
import json

class EmotionDetectionService:
    def __init__(self):
        self.app = FastAPI(title="Emotion Detection API")
        self.setup_middleware()
        self.setup_routes()
        self.processor = None
        self.redis = None
        
    async def startup(self):
        """Initialize services on startup"""
        # Initialize model
        self.processor = VideoProcessor(
            FaceDetector(),
            EmotionClassifier()
        )
        
        # Initialize Redis
        self.redis = await aioredis.create_redis_pool('redis://localhost')
        
    async def shutdown(self):
        """Cleanup on shutdown"""
        if self.redis is not None:
            self.redis.close()
            await self.redis.wait_closed()
            
    def setup_middleware(self):
        """Setup CORS and other middleware"""
        self.app.add_middleware(
            CORSMiddleware,
            allow_origins=["*"],
            allow_credentials=True,
            allow_methods=["*"],
            allow_headers=["*"],
        )
        
    def setup_routes(self):
        """Setup API routes"""
        @self.app.post("/detect-emotion")
        async def detect_emotion(
            file: UploadFile = File(...),
            background_tasks: BackgroundTasks
        ):
            try:
                # Generate job ID
                job_id = str(uuid.uuid4())
                
                # Add to processing queue
                background_tasks.add_task(
                    self.process_image,
                    job_id,
                    await file.read()
                )
                
                return {"job_id": job_id}
                
            except Exception as e:
                return JSONResponse(
                    content={"error": str(e)},
                    status_code=500
                )
                
        @self.app.get("/result/{job_id}")
        async def get_result(job_id: str):
            """Get processing result"""
            result = await self.redis.get(job_id)
            
            if result is None:
                return {"status": "processing"}
                
            return json.loads(result)
            
    async def process_image(self, job_id: str, image_data: bytes):
        """Process image in background"""
        try:
            # Decode image
            nparr = np.frombuffer(image_data, np.uint8)
            image = cv2.imdecode(nparr, cv2.IMREAD_COLOR)
            
            # Process image
            results = self.processor.process_frame(image)
            
            # Format results
            detections = [
                {
                    "bbox": result['bbox'],
                    "emotion": result['emotion'],
                    "confidence": float(result['confidence'])
                }
                for result in results
            ]
            
            # Store results
            await self.redis.set(
                job_id,
                json.dumps({"detections": detections}),
                expire=3600  # Expire after 1 hour
            )
            
        except Exception as e:
            await self.redis.set(
                job_id,
                json.dumps({"error": str(e)}),
                expire=3600
            )
            
# Create service instance
service = EmotionDetectionService()

# Add startup and shutdown events
@service.app.on_event("startup")
async def startup_event():
    await service.startup()
    
@service.app.on_event("shutdown")
async def shutdown_event():
    await service.shutdown()